THE LIVE ACCESS SERVER AND DODS:
Web Visualization and Data Fusion for Distributed Holdings
January 2001
Steve Hankin, 1,* Jonathan Callahan, 2 and Joseph Sirott
1 National Oceanic and Atmospheric Administration, Pacific Marine Environmental Laboratory, Seattle, Washington
2 Joint Institute for the Study of the Atmosphere and Ocean, University of Washington
ABSTRACT
Since 1994 the Live Access Server (LAS) has been providing visualization and subsetting of multi-dimensional scientific data for Web users. This talk presents a high level overview of the capabilities of LAS version 4, a distributed "data fusion" system designed to support collaborative research.
LAS is designed to be easily installed, configured, and maintained. An individual LAS site can provide access both to locally held data sets and to distributed data -- often data sets juxtaposed for purposes of comparison. Users can co-plot and difference (with regridding as required) the comparative data sets. Binary access to remote data sets is provided transparently by the Distributed Ocean Data System (DODS).
An individual LAS can designate a cluster of cooperating sites as "sisters". LAS automatically configures a group of sisters to appear to users as a single (virtual) site. For example, distributed modeling sites can configure themselves as a collaborative project where all model outputs are available for comparison within a single interface.
The technologies upon which the Live Access Server is built are presented in greater detail in 10.6, Inside the Live Access Server.
1. INTRODUCTION
Since 1994 the Live Access Server (LAS) has been providing visualization and subsetting of multi-dimensional scientific data for Web users. (Hankin et al., 1998) Chief among the initial design goals of LAS was to break through the data access barriers of file size, location, and format by providing three key areas of functionality: visualization, subsetting, and reformatting. On-the-fly visualization makes it possible to explore the data set entirely within a Web browser environment. Should further analysis at the desktop be desirable subsetting allows the scientist to download small units of data that move efficiently on the Internet. Reformatting makes it simple to ingest the data into the scientist's choice of desktop environments.
Recent work on LAS has extended these initial goals to providing data support to collaborations between distributed researchers. LAS supports collaborative research activities by providing 1) common access to reference data sets (without undue duplication of effort at each site); 2) shared, mutual access (visualization and subsetting) to distributed data sets; and 3) the ability to inter-compare distributed data holdings.
2. SISTER SERVERS
An important characteristic of a collaborative research data system is that a scientist, while investigating data, may roam freely between data that his/her own organization is providing, data provided by other collaborating sites, and reference data sets of general interest (e.g. climatologies). We refer to this style of access as a shared virtual data base.
LAS creates a shared virtual data base as a natural extension to its modular design (Sirott et al., in press). Each LAS site is based upon a collection of configuration files encoded in XML (XML information, 2000).These configuration files contain 100% of the information required to request a product (a visualization or data subset) from an LAS server -- names of data sets and variables and specifications for the widgets needed to request subsets of those variables. Thus, the simple act of exchanging configuration files (via ftp or http) is sufficient to allow multiple LAS sites each to present a user interface to their complete mutual holdings. Within LAS we refer to sites connected in this way as "sister servers" (Fig. 1). Requests for a product from the Web browser interface are sent to the sister that "owns" the data set that contains the requested variable.
3. TRANSPARENT REMOTE ACCESS WITH DODS
Within a data-oriented research activity there are typically certain data sets that have the status of reference material. Examples of such data sets for the field of ocean-climate research are the COADS data set (Woodruff et. al., 1987) and the World Ocean Atlas. A distributed collaborative project is likely to have a collection of more specialized data sets that have a similar status - for example, a distributed collaboration between tsunami innundation modelers may require uniform access to the collection of open ocean tsunami propagation runs that is used to generate the localized forcing fields.
Collaborative groups (and indeed entire research communities) incur huge inefficiencies through the duplicated efforts of managing reference data sets. The Distributed Ocean Data System (DODS) (Hankin, 2001) provides transparent access to remote data for existing applications. Thus the use of DODS makes it possible for Web servers such as LAS to provide access to reference data sets without incurring any of the costs associated with managing the data set.
Within LAS the use of a remote DODS data set differs from a local data set only in the filename, which begins with "HTTP://" for a DODS data set. Sharing access to reference data sets is reduced to exchanging the small XML files that describe their metadata.
4. DISTRIBUTED DATA FUSION
The goal of collaborative research is to create a whole that is greater than the sum of its parts. Data systems must support this goal by helping to pull information from distributed data sets into merged calculations.
LAS version 4 has taken a first step in this direction by providing the ability to difference and to graphically overlay fields. For example, a scientists may computes the difference between a global sea surface temperature field from a reference data product and from his/her current model outputs. Or he/she may overlay SST with wind speeds. Or he/she may compare a diagnostic variables, such as vertical diffusion of heat, between two model runs. If the fields are on different grids, LAS will automatically regrid "variable 2" to the grid coordinates of "variable 1" through multi-linear interpolation. Differencing and overlaying operations are provided for all mutually relevant geometries of the data sets: lat-long maps, profiles, time series, etc.
Since LAS sister sites combine shared metadata awareness (through exchanging XML configuration files), and transparent remote data access (through DODS) the ability to inter-compare files extends almost automatically across the distributed data holdings of the collaboration.
5. CONCLUSION
LAS has been designed as a modular, flexible system with an expectation that it must continually be adapted to new types of data and new disciplines. A table of some current LAS sites, which may be found at provides an overview of some of the current uses of LAS.
In the coming years we expect to be concentrating our efforts within the LAS project on: i) scalability to large numbers of data sets and variables; ii) flexibility with respect to data structure in order to support various in-situ data collections; iii) basic analysis capabilities (averages, integrals, etc.) to support collaborative model inter-comparison; and iv) rapid configuration of new data sets (including the challenges of incomplete metadata) and real-time update of configuration information.
6. REFERENCES
Hankin S., J. Davison, J. Callahan, D. E. Harrison, K. O'Brien, 1998: A configurable web server for gridded data: a framework for collaboration. In 14th International Conference on Interactive Information and Processing Systems for Meteorology, Oceanography, and Hydrology, AMS, 417-418.
Hankin S., 2001: DODS, The Distributed Ocean Data System. In 17th International Conference on Interactive Information and Processing Systems for Meteorology, Oceanography, and Hydrology, AMS (in press).
Sirott J., Callahan J., Hankin S., 2001: Inside the Live Access Server. In 17th International Conference on Interactive Information and Processing Systems for Meteorology, Oceanography, and Hydrology, AMS (in press).
Woodruff, S.D. et al., 1987: A Comprehensive Ocean-Atmosphere Data Set, Bull. Amer. Meteor. Soc., 68, 1239-1248.