An Ensemble Axis Aggregation
Basic Aggregation Setup for the Time Axis
In order to build the aggregation you need to in order to use the ensemble facilities in LAS first you have to prepare your data collection so that you have one data source for each ensemble member. That means that if one ensemble run consists of several files covering different time ranges, first you will need to create a time aggregation of those data. Once you have a collection of files or OPeNDAP aggregations where each on represents on ensemble member you can create the aggregation along the time axis.
In this example, we'll walk you through the process of creating a time aggregation and the ensemble aggregation together in one NCML file. The example presented here is a small sub-set of the full ensemble, but should serve to illustrate all of the steps.
We begin with data files for two parameters, covering three year.
tas_CLIVAR_atm_monthly.198204-198303.nc
tas_CLIVAR_atm_monthly.198304-198403.nc
tas_CLIVAR_atm_monthly.198404-198503.nc
zg_CLIVAR_atm_monthly.198204-198303.nc
zg_CLIVAR_atm_monthly.198304-198403.nc
zg_CLIVAR_atm_monthly.198404-198503.nc
These data represent one of the ensemble runs and we will end up creating an ensemble of three such runs, but first we have get these files organized along the time axis. The NCML facilities in the Java netCDF library allow you to easily create a time aggregation of these data.
Below is the bare bones XML needed to create the time aggregation.
<dataset ID="CM2.1U_CDAef_v1.0_apf r1 Atmosphere" name="CM2.1U_CDAef_v1.0_apf r1 Atmosphere" urlPath="CM2.1U_CDApf_v1.0_r1Atmos_wo_vars">
<dataType>Grid</dataType>
<property name="viewer" value="http://data1.gfdl.noaa.gov:8380/lasV7/getUI.do?data_url=http://data1.gfdl.noaa.gov:8380/thredds/dodsC/CM2.1U_CDApf_v1.0_r1Atmos, Visualize with Live Access Server"/>
<serviceName>ipcc</serviceName>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<aggregation type="union">
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<aggregation dimName="time" type="joinExisting" timeUnitsChange="true">
<scan location="/home/users/rhs/clivar/gfdl_cm2_1/CM2.1U_CDAef_v1.0_apf/r1/pp/atmos/ts/monthly" suffix="tas_*.nc"/>
</aggregation>
</netcdf>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<aggregation dimName="time" type="joinExisting" timeUnitsChange="true">
<scan location="/home/users/rhs/clivar/gfdl_cm2_1/CM2.1U_CDAef_v1.0_apf/r1/pp/atmos/ts/monthly" suffix="zg_*.nc"/>
</aggregation>
</netcdf>
</aggregation>
</netcdf>
</dataset>
You should take note of several things about what the XML above. First of all, the inner <aggregation> element using type="joinExisting" is where the files are being organized along the time axis. As it turns out, these files use a different base date for the time units so the timeUnitsChange="true" is required to let the software know that it must extract the units from each file when computing the date/time values within the time axis of that file.
To further organize the data within we will use an outer type="union" aggregation so that both data parameters will be available from a single URL. This is not strictly necessary to work with these data within LAS since each variable can have it's own data source URL, but it certainly makes access to the data easier for other clients.
The Aggregation Complete Setup for the Time Axis
We might expect that the setup above would all that's necessary to prepare each ensemble run, but it turns out that we can do better. In its current implementation the Unidata CDM library does not take into account the CF time bounds variable when aggregating along the time axis. So as a result, in the case of these files where the time axis "starts over" with a new base date in each file, the time bounds variable will get aggregated together, but the values will be wrong. And as it turns out, there are also some problems with the automatic generation of the time values to construct the virtual time axis in the aggregation. To work around these limitations we can get an even better representation of the data as an aggregation by using the power of NCML to specify exactly what values to use for the time axis and the time bounds variable.
<dataset ID="CM2.1U_CDAef_v1.0_apf r1 Atmosphere" name="CM2.1U_CDAef_v1.0_apf r1 Atmosphere" urlPath="CM2.1U_CDApf_v1.0_r1Atmos">
<dataType>Grid</dataType>
<property name="viewer" value="http://data1.gfdl.noaa.gov:8380/lasV7/getUI.do?data_url=http://data1.gfdl.noaa.gov:8380/thredds/dodsC/CM2.1U_CDApf_v1.0_r1Atmos, Visualize with Live Access Server"/>
<serviceName>ipcc</serviceName>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<aggregation type="union">
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<variable name="time" shape="time" type="float">
<attribute name="units" value="days since 1982-04-01 00:00:00"/>
<attribute name="bounds" value="time_bnds"/>
<attribute name="_CoordinateAxisType" value="Time"/>
<values start="15" increment="30.5"/>
</variable>
<variable name="time_bnds" shape="time bnds" type="float">
<values>
0 30
30 61
61 91
91 122
122 153
153 183
183 214
214 244
244 275
275 306
306 334
334 365
365 395
395 426
426 456
456 487
487 518
518 548
548 579
579 609
609 640
640 671
671 700
700 731
731 761
761 792
792 822
822 853
853 884
884 914
914 945
945 975
975 1006
1006 1037
1037 1065
1065 1096
</values>
</variable>
<aggregation dimName="time" type="joinExisting" timeUnitsChange="true">
<scan location="/home/users/rhs/clivar/gfdl_cm2_1/CM2.1U_CDAef_v1.0_apf/r1/pp/atmos/ts/monthly" suffix="tas_*.nc"/>
</aggregation>
</netcdf>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<variable name="time" shape="time" type="float">
<attribute name="units" value="days since 1982-04-01 00:00:00"/>
<attribute name="bounds" value="time_bnds"/>
<attribute name="_CoordinateAxisType" value="Time"/>
<values start="15" increment="30.5"/>
</variable>
<variable name="time_bnds" shape="time bnds" type="float">
<values>
0 30
30 61
61 91
91 122
122 153
153 183
183 214
214 244
244 275
275 306
306 334
334 365
365 395
395 426
426 456
456 487
487 518
518 548
548 579
579 609
609 640
640 671
671 700
700 731
731 761
761 792
792 822
822 853
853 884
884 914
914 945
945 975
975 1006
1006 1037
1037 1065
1065 1096
</values>
</variable>
<aggregation dimName="time" type="joinExisting" timeUnitsChange="true">
<scan location="/home/users/rhs/clivar/gfdl_cm2_1/CM2.1U_CDAef_v1.0_apf/r1/pp/atmos/ts/monthly" suffix="zg_*.nc"/>
</aggregation>
</netcdf>
</aggregation>
</netcdf>
</dataset>
The resulting file looks complicated, but it's mostly because we are forced to list each value for the time bounds array. Of course, if you have a large data collection this could be quite tedious and you might have to consider some automated method to generate these values. Using the same name as the existing time_bnds variable cause the values in the aggregation to be replace by the values we supply in the NCML.
With the time axis, the changes are even simpler. We can specify the times we want by assuming regularly space data starting in the middle of the month with 30.5 days between each increment. This will land us in the middle of each month for the axis value and the time bounds will specify the precise interval covered by each data grid along the time axis. We also used NCML to specify some attribute values for the time axis.
Once this process is repeated for each of the three runs we can go on to the next step of preparing the ensemble aggregation.
Assembling the Ensemble Runs into an Aggregation
Once the above work is installed into a THREDDS Data Server for each ensemble run, you will have one data access URL for each run. The result can then be further aggregated with an additional axis. In this case you are creating a new axis so the aggregation type will be type="joinNew".
<?xml version="1.0"?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<dimension name="ensemble" lenght="3"/>
<variable name="ensemble" type="double">
<attribute name="long_name" value="Ensemble of Realizations"/>
<attribute name="_CoordinateAxisType" value="Ensemble"/>
<attribute name="axis" value="E"/>
<attribute name="standard_name" value="realization"/>
</variable>
<variable shape="ensemble" name="labels" type="String">
<attribute name="long_name" value="Realizations"/>
<values>Realization01 Realization02 Realization03</values>
</variable>
<variable name="plev">
<attribute name="positive" value="down"/>
</variable>
<aggregation dimName="ensemble" type="joinNew">
<variableAgg name="tas"/>
<variableAgg name="zg"/>
<netcdf location="http://dunkel.pmel.noaa.gov:8930/thredds/dodsC/CM2.1U_CDApf_v1.0_r1Atmos" coordValue="1"/>
<netcdf location="http://dunkel.pmel.noaa.gov:8930/thredds/dodsC/CM2.1U_CDApf_v1.0_r2Atmos" coordValue="2"/>
<netcdf location="http://dunkel.pmel.noaa.gov:8930/thredds/dodsC/CM2.1U_CDApf_v1.0_r3Atmos" coordValue="3"/>
</aggregation>
</netcdf>
One word of caution. If you are aggregating existing files that have time as the unlimited dimension, you will have modify the existing time axis using NCML to ask the server not treat it as the unlimited dimension since it will not longer be the outer dimension and the standard client library will not be able to read the data source.
In the case of this aggregation, we build the actual aggregation axis using a variable of data type double and supply the value of the coordinate when specifying the URL of each aggregated ensemble run. When building the ensemble axis we also supply some attributes, <attribute name="_CoordinateAxisType" value="Ensemble"/>, <attribute name="axis" value="E"/>, and <attribute name="standard_name" value="realization"/> all of which serve to identify this axis as an ensemble axis to various software clients including LAS.
Finally, the CF standard describes a way to provide text labels to an ensemble axis by creating a character variable with the same dimension as the actual coordinate variable of the axis. If you add such a variable, LAS will use it when building the user interface for interacting with this data set.
Finally, we fix the plev vertical coordinate so that it has the positive="down" attribute, a requirement for CF, so the entire collection can be recognized as a scientific data grid in the CDM software. Now that we have our aggregations built, we can configure LAS to use the ensemble.