The Intermediate netCDF file format

Last modified: Mon, 04/03/2017 - 17:29

Details about the netCDF file Ferret uses to display scattered data.

Ferret cannot read data directly from a database server. Therefore, the LAS services that create products from scatter data source all create an intermediate netCDF file. Such a product is created in two steps. First the scattered data service builds the intermediate netCDF file, then the Ferret backend service creates the plot.

An example file is given here. The output from ncdump -h insitu_example.nc is given below, followed by an explanation.

netcdf insitu_demo_netcdf {
dimensions:
 index = 7266 ;
 dim_one = 1 ;
 trdim = 2 ;
variables:
 double trdim(trdim) ;
 trdim:units = "hours since 1970-01-01 00:00:00" ;
 trdim:time_origin = "01-Jan-1970 00:00:00" ;
 double trange(trdim) ;
 trange:units = "hours" ;
 float NUMPROFS(dim_one) ;
 NUMPROFS:long_name = "Number of Profiles" ;
 NUMPROFS:units = "unitless" ;
 float NUMOBS(dim_one) ;
 NUMOBS:long_name = "Number of Observations" ;
 NUMOBS:units = "unitless" ;
 int CRUISE_ID(index) ;
 CRUISE_ID:long_name = "CRUISE ID" ;
 CRUISE_ID:units = "unitless" ;
 CRUISE_ID:missing_value = -999. ;
 CRUISE_ID:database_table = "Indian" ;
 float Depth(index) ;
 Depth:units = "meters" ;
 Depth:long_name = "Depth" ;
 Depth:missing = -999. ;
 Depth:database_table = "Indian" ;
 float Latitude(index) ;
 Latitude:missing = -999. ;
 Latitude:units = "degrees_north" ;
 Latitude:long_name = "Latitude" ;
 Latitude:database_table = "Indian" ;
 Latitude:_CoordinateAxisType = "Lat" ;
 float Longitude(index) ;
 Longitude:missing = -999.f ;
 Longitude:units = "degrees_east" ;
 Longitude:long_name = "Longitude" ;
 Longitude:database_table = "Indian" ;
 Longitude:_CoordinateAxisType = "Lon" ;
 double t(index) ;
 t:units = "hours since 1970-01-01 00:00:00" ;
 t:time_origin = "01-Jan-1970 00:00:00" ;
 t:long_name = "Time" ;
 t:missing = -999. ;
 t:database_table = "Indian" ;
 t:_CoordinateAxisType = "Time" ;
 float Salinity(index) ;
 Salinity:long_name = "Salinity: " ;
 Salinity:units = "" ;
 Salinity:missing_value = -999. ;
 Salinity:database_table = "Indian" ;
 float Temperature(index) ;
 Temperature:long_name = "Temperature: " ;
 Temperature:units = "umol/kg" ;
 Temperature:missing_value = -999. ;
 Temperature:database_table = "Indian" ;
 double PROF_ID(index) ;
 PROF_ID:long_name = "Profile ID" ;
 PROF_ID:units = "unitless" ;
 PROF_ID:missing_value = -999. ;

// global attributes:
 :Conventions = "LAS Intermediate netCDF File, Unidata Observation Dataset v1.0" ;
 :observationDimension = "index" ;
 :cdm_datatype = "Point" ;
 :geospatial_lat_min = -67. ;
 :geospatial_lat_max = 27.2670001983643 ;
 :geospatial_lon_min = 18.4500007629395 ;
 :geospatial_lon_max = 114.827003479004 ;
 :time_coverage_start = "69816.0 hours since 1970-01-01 00:00:00" ;
 :time_coverage_end = "157392.0 hours since 1970-01-01 00:00:00" ;
}

This file represents a database query that returned 7266 values of Temperature and Salinity at various locations and depths. All of the variables mentioned are required except for CRUISE_ID which should only be included if your database supports queries based on cruise ID.

The (xax, yax, zax, tax, Temperature) variables are generated from values returned by the database while other variables are calculated on the fly by the code that creates the intermediate NetCDF file. The following set of rules should be all that is required for you to create valid intermediate files:

All times must be given in units of "hours since ..." with a time origin as specified in the tax:time_origin attribute.
The trdim variable must contain the first and last times (in "hours since") in the data subset.
All data variables ('Temperature' in this example but multiple variables are possible) must specify the attributes "long_name", "units" and "missing_value".
The xax variable must have units = "degrees_east".
The yax variable must have units = "degrees_north".
The PROF_ID variable must be assigned if it is not part of the database. For surface only data the values of PROF_ID would range from (1...index).
If you are dealing with actual profile data, all of the data associated with a single profile must be stored as consecutive elements of the array with monotonically increasing depth. For three profiles you might have the following values:
PROF_ID =(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3)
zax = (0,5,10,20,30,50,100,200,500,1000,0,5,10,20,30,40,50,75,100,0,100,200,300,400,500)

Answer to some questions about the format:

Why is NUMPROFS a float but PROF_ID is a double?

Don't they hold the same range of values?

But the bigger question is: why aren't they both integers, which seems to be a more appropriate data type?

Why isn't NUMOBS an integer?

It doesn't matter whether these values are 'int' or 'float'. Ferret converts 'int' to 'float' upon ingest.

What is NUMOBS? Is it the number of observations per profile?

Is it the total number of valid (i.e., non-missing value) values?

This is the total number of observations at all levels. This number is used to label the output plot.

Must zax units be "meters"?

The zax variable can be in whatever units you want. If the units are understood by the UDUNITS package from UNIDATA then the z axis will be labeled appropriately.

Must all times be Zulu (not local) times, or is the time zone not specified by the standard?

Yes, Ferret is not time zone aware (It was developed for climatological dataset.) so that you must make any time zone adjustments if that is important to you.

The variables PROF_ID, CRUISE_ID (if present), xax, yax, zax, tax, Temperature all allow missing values. But isn't a row of data useless if any of those values are missing?

Yes, but we don't require that you excise all of the missing data. Ferret won't plot missing data so there is less burden on the code that creates the intermediate file.

"If you are dealing with actual profile data, all of the data ..." I understand that all of the data that is present must be in sorted order. But, must all of the data be present (i.e., can a row be missing)? Or must missing data be represented by a row of data with a missing_value for the response variable?

Ferret will plot whatever data you give it and will ignore data marked with the "missing_value" flag. Just make sure that all of the variables dimensioned with (index) match up.

Just curious, are CRUISE_ID's standardized? Do the CRUISE_ID values represent values from one of several CRUISE_ID nomenclatures?

Various science groups and data management sites have adopted different CRUISE_ID nomenclatures. We are just providing a placeholder here, knowing that CRUISE_ID is often something folks want to see.

Why are the variable names so terse (e.g., "xax" instead of "xAxis", "trdim" instead of ?)? Is saving 2 bytes in the file really worth decreased readability?

Why the different naming conventions (e.g., some variables are all capital letters and some are all lower case)? Is there some significance to the differences?

Here we see the institutional disease of having several programmers working with the same variables over and over and over again -- the names used to refer to those variables tend to get shorter and shorter. And each new programmer pretty much goes along with the parts of the code that work. Unfortunately, code cleanup is rarely assigned as a high priority task.

For time_origin's, why is a non-standard date format (e.g., "01-Jan-1977") used instead of the ISO standard ("1977-01-01")?

Ferret has its own idiosyncratic time format but it turns out that you don't even need to specify T0 if you are using time units of "~units~ since ~ISO time~").

If trdim is the time range (two values: min and max time in the file), what is trange?

Looking at comments in the code I see that 'trdim' and 'trange' have exactly the same contents. Various plotting scripts use the values of 'trange' but none seem to use 'trdim'. I expect that 'trdim' could be left out.

Search form

You are here

Answer to some questions about the format: