Chapter 8: WORKING WITH SPECIAL DATA SETS
Ch8 Sec1. WHAT IS NON-GRIDDED DATA?
Many data sets which are not normally regarded as "gridded" can nonetheless be managed, analyzed, and visualized effectively in a gridded data framework. Track lines, "point data", etc. are common examples of "non-gridded" data. Profiles and time series, although they are individually simple one-dimensional grids, have a non-gridded structure when considered as a collection, which is often essential.
This chapter addresses a number of classes of non-gridded data sets and offers approaches that make it straightforward to work with these data types in Ferret's gridded data framework. The approaches are all conceived to facilitate a fusion of these data typesso that multiple data types may be easily combined in calculations..
"Point data" refers to collections of values at scattered locations and times. An example would be the column burden of oceanic NO3 and the scattered locations and times at which the measurements were made.
In a gridded context point data is best viewed as a collection of 1-dimensional variables, where the axis of each variable is the index value, 1, 2, 3, ... of the individual point in the scatter. Thus, continuing our example of an oceanic NO3 data set, we would want to view this as four variables, longitude, latitude, date, and burden, where each variable was defined on a one-dimensional axis of earthquake number. Typically, this sort of data is organized in a table of the form
|
Index |
longitude |
latitude |
year |
month |
day |
N03 |
|
1 |
160 |
30 |
1968 |
11 |
-999 |
6.2 |
|
2 |
33.1 |
60.2 |
1992 |
5 |
13 |
5.5 |
|
... |
Ch8 Sec2.1. Getting point data into Ferret
Since point data sets are most commonly available in table form, where the columns of the table are the variables and each row of the table is a separate point. In the chapter "Data Set Basics", section "Reading ASCII Files" (p. 46), example 2 and subsequent examples show how such a file might be read into Ferret.
For example, let us suppose that the file above is introduced to Ferret with the command
yes? FILE/VAR="index,lon,lat,yr,mn,day,NO3"/SKIP=1 my_data_file.dat
yes?
SHOW DATA my_data_file.dat
currently SET data sets:
1> ./my_data_file.dat
(default)
name title I J
K L
LON LON 1:20480 ...
...
LAT LAT 1:20480 ...
...
YR YR 1:20480 ...
...
MN MN 1:20480 ...
...
DAY DAY 1:20480 ...
...
NO3 NO3 1:20480 ...
...
Note that the SET VARIABLE command would normally be used as well to assign titles, units, and missing value flags to the variables.
Also note that until the first data is actually requested from the file, Ferret does not know the size of the file. The /GRID= option may be used to tell Ferret what size to expect. Lacking a /GRID specification the "1:20480" is the size of the default grid "EZ." After the first data access SHOW GRID will reveal the true size of the file, instead. If the size still appears to be 20480 it may be that the default grid EZ was not large enough, and the /GRID qualifier must be used to pre-allocate sufficient space.
Ch8 Sec2.2. How point data is structured in Ferret
In table form (above) each column represents a dependent variable; the column for "burden" and the column for "latitude" have equal status. In many cases this is an adequate representation. For example, a plot of NO3 burden versus latitude could be produced with the command
yes? PLOT/VS lat, NO3
To combine point data organized in tables with gridded data sources, say a gridded field of oceanic temperature two approaches are available. Either the gridded data may be viewed in the structure of the table, or the scattered data may be viewed in a geo-referenced 1-dimensional grid structure. The problem to be solved determines which approach is suitable. The next two sections describe these two approaches.
Ch8 Sec2.2.1. Working with dates
Ferret V5.0 does not understand formatted dates inside of generic data ASCII files. To use the dates intelligibly inside of Ferret you
1. Need to get the year, month, and day fields broken out separately or provide a Julian day. SET DATA/FORMAT=DELIMITED (p. 402) is helpful for inputting date information.
2. Can create a Julian date from year, month, day using function DAYS1900. If a time origin other that 1-jan-1900 is needed subtract DAYS1900(year0, mon0, day0). For help in creating the dates, see the FAQ, "How can I create a time axis from variables containing year, month, day, etc?" at http://ferret.pmel.noaa.gov/FERRET_17sep07/FAQ/axes_and_data/time_axis_from_variables.html
3. Can create an axis of dates as done in the preceding latitude axis example.
See the chapter "Grids and Regions", section "Time" (p. 164) and the section in the chapter "Converting to NetCDF" on "Converting time word data to numerical data" (p. 282) for details of creating time axes.
Ch8 Sec2.3. Subsampling gridded fields onto point locations and times
Ferret can be used as a tool to extract variables from gridded data sets at time/space locations to match the scatter of the point data. In this form they may, effectively, be combined into the table of data read from the ASCII (or binary) file. For example, suppose we want to obtain values of sea surface temperature at the locations of our NO3 samples, from a climatological annual average SST field. This may be accomplished simply with
yes? use coads climatology
yes? let ssttav = sst[l=1:12@ave]
yes? let my_lon
= lon[d=my_data_file.dat]
yes? let my_lat = lat[d=my_data_file.dat]
yes?
LET sst_xy = SAMPLEXY(ssttav, my_lon, my_lat)
Suppose that instead we defined our XY sampling based upon the 12 month time series of SST grids as in
yes? LET sst_xy = SAMPLEXY(sst, my_lon, my_lat)
The variable sst_xy as defined above would then have a two-dimensional structure: sample index by 12 months. To sample this in time we use
yes? LET zero = 0 + 0*mn
yes? LET sst_t = SAMPLET_DATE(sst_xy,zero,mn,day,zero,zero,zero)
Note that the year is entered simply as 0, since SST is a climatological variable, and the variable "zero" is the same length as the variables "mn" and "day". To sample a field at hour 12 of each day, we could use "zero+12" for the hours argument.
In this example we sampled a field in X, Y, and T. The sst data was sampled at each time. If we were sampling a field which had a Z axis, that axis would be inherited from the first argument to SAMPLEXY in the same way; it would be sampled at the (x,y) points at each Z level.
Ch8 Sec2.4. Defining gridded variables from point data
There are functions to interpolate scattered data onto a grid. See the scat2gridgauss and scat2gridlaplace functions (p. 95ff). These functions map irregular locations to a regular grid.
For some calculations one may want to let Ferret know which of the variables are dependent (measurements) and which are independent (coordinates). For example, suppose we wish to compute the average column burden of NO3 as a function of latitude. Burden here is an integral of the concentration NO3 over depth. We will want to see our variable burden on an axis of latitude.
The steps to do this are
1. In general, the latitude variable will not be sorted into strictly increasing order needed to create an axis. Determine the sorting order for latitude using
yes? LET lat_index = SORTI(lat)
2. Create a latitude grid
yes? DEFINE AXIS/FROM/NAME=lat_ax/Y/UNITS=degrees SAMPLEI(lat, lat_index)
yes? DEFINE GRID/Y=lat_ax glat
yes? LET NEW = Y[g=glat] ! a dummy
variable to use in RESHAPE below
3. Define your function for the burden based on the variable NO3, on the command line or using your script my_brdn.jnl.
yes? GO my_brdn NO3 burden
4. Define a new variable burden_on_lat using this axis
yes? LET sorted_burden = SAMPLEI(burden, lat_index)
yes? LET burden_on_lat
= RESHAPE( sorted_burden, new )
5. Now, to plot the NO3 burden averaged into 5 degree latitude bands we could use
yes? PLOT burden_on_lat[Y=60s:30n:5@AVE]
Ch8 Sec2.5. Visualization techniques for point data
Scattered point data can be displayed in a number of ways.
A simple scatter plot showing the locations of points
yes? PLOT/VS lon,lat
yes? GO land
Use GO/help land for an explanation of resolving incompatible longitude encodings, should they arise.
A scatter plot in which the symbols are colored by value with control over the color palette and resolution can be made using the polymark.jnl script. For example, to plot using stars symbols in color levels by 10s use
yes? GO polymark POLYGON/LEV=(0,100,10) lon lat NO3 star.
Type GO/HELP polymark for more options.
See also the chapter "Customizing Plots", section "Map Projections" (p. 224) for guidance on plotting scattered data. The map projection scripts can be used in conjunction with the above.
A single profile, possibly consisting of multiple variables, can be regarded as a simple 1-dimensional data set. Ferret's plotting and analysis tools apply in a straightforward manner.
Collections of profiles resemble point data sets in their X,Y, and T structure, however at each point there is a 1-dimensional Z-axis structure. In general, the Z axes at each point may differ.
Ch8 Sec3.1. How collections of profiles are structured in Ferret
If the collection of profiles is sufficiently small (say 4 or fewer) then it is straightforward to handle them simply as 4 separate data sets. The D= qualifier may be used to designate which profile is being referred to. The IF ... THEN ... ELSE syntax may be used to combine the profiles into expressions.
As the number of profiles in the collection grows larger, however, it becomes necessary to merge them into a single structure. Typically, the sequence number of the profile, 1, 2, ...,N, becomes the X axis of the collection. The longitude, latitude, and time of each profile become dependent variables indexed by the sequence number. The Z structures of the profiles are blended into a single Z axis by a choice of techniques. The steps to creating a blended data set then become:
1. Determine the nature of the Z axis to be used and the collection of variables to be defined on the grid
2. Create an empty grid with the desired structure in a file
3. Populate the file with the profiles, each profile in turn.
The determination of the Z axis structure may be by any of these techniques:
1. Supply an arbitrary Z axis to which all of the individual profiles will be regridded by linear interpolation. This technique produces a data set which is very easy to work with and small in size, however, some of the data have been altered by linear interpolation. The default Ferret regridding (GZ=@LIN) is used for this technique.
2. Create a Z axis which is a superset of the Z axis points from all of the grids. In the final data set this axis will be sparsely populated, containing only those Z points that were actually present in each profile.
This technique produces a data set which is 100% faithful to the original data and reasonably easy to work with, but may become very large if the number of profiles is large and the Z axes vary greatly. Ferret "exact match" regridding (GZ=@XACT) is used for this technique.
3. Do not create a Z axis at all instead store the Z coordinates as a dependent variable. The Z axis becomes simply an index counter of length equal to the longest profile. This technique produces a data set which is 100% faithful to the original data and of modest size, however it is the most laborious to work with.
The choice of technique depends on the nature of the profile collection and the types of analysis or visualization to be done. Often it is desirable to combine technique 1, which is fast and simple with 2 or 3, which can be used for spot checking if there is a question of data fidelity. If method 3 is chosen (Z coordinates in a dependent variable) the techniques for handling the variables are very similar to sigma coordinate data, described in a separate section of this chapter (p. 251).
Ch8 Sec3.2. Getting profile data into Ferret
As of 4/99 the approaches to merging collections of profiles into a single structure are still "manual." (Data which are stored as global attributes in the input files, as is done in EPIC files, are lost in this process.) This text describes an example of the manual process used, where the target Z axis is created arbitrarily and data are interpolated to it. In this example the profiles are read from ASCII files, so the Z axis of each profile has to be created. This example does not save the longitude, latitude, and time positions of the casts.
! for this example we begin by manufacturing some data
! ... pretend this
is one of your casts - unequal vertical spacing
LIST/FILE=test_cast.dat/NOHEAD/FORM=(2F)/I=1:10
10*i+randu(i), sin(i/6)
! create a grid suitable for ALL casts together
!
make the points regular in X and Z ... they need not be, however
DEFINE
AXIS/DEPTH/Z=0:1000:20/UNIT=meters zall ! Arbitrarty z axis
DEFINE AXIS/X=0:9:1/UNIT="sequence"
xall
DEFINE GRID/X=xall/Z=zall gall
! create an empty output file
! if we
were reading netCDF files we would create variables to hold
! longitude,
latitude, and time (year, month, day).
! A latitude output variable, for
example, is created below
LET outvar = 1/0 * x[g=gall] * z[g=gall]
SET VARIABLE/TITLE="My merged var"/UNITS="my
units" outvar
SAVE/FILE=all_casts.cdf/ILIMITS=1:10/ZLIMITS=0:1000 outvar
LET
LAT = 1/0*X[gx=gall]
SET VARIABLE/TITLE="Latitude"/UNITS="degrees" lat
SAVE/APPEND/FILE=all_casts.cdf/ILIMITS=1:10
lat
! read in a single cast (the fake data we created)
! if we were reading
a netCDF file this block would be unnecessary
FILE/VAR=depth,invar test_cast.dat
! make Z axis for 1 profile
DEFINE AXIS/Z/DEPTH/UNIT=meters z1cast=depth
DEFINE AXIS/X=0:0:1/UNIT="sequence" x1cast ! sequence no. of 1st cast
DEFINE
GRID/X=x1cast/Z=z1cast g1cast
CANC DATA 1
! save first cast interpolated
to many-point Z axis
FILE/VAR="-,invar"/GRID=g1cast test_cast.dat
LET
outvar = invar[g=gall]
SAVE/APPEND/FILE=all_casts.cdf outvar[I=1]
CANC DATA
1
! if available, output latitude thusly
! LET lat = 0*X[g=gall] + RESHAPE(Y[G=invar],X[gx=gall])
!
SAVE/append/file=all_casts.cdf lat[I=1]
! save next cast
DEFINE AXIS/X=1:1:1/UNIT="sequence" x1cast ! X position of 2nd cast
FILE/VAR="-,invar"/grid=g1cast
test_cast2.dat
SAVE/APPEND/FILE=all_casts.cdf outvar[I=2]
CANC DATA 1
! etc
for next 8 casts
! This may be automated with: REPEAT/I=1:10 GO output_one_profile
!
where the script output_one_profile.jnl reads profile file names
! from a list
The output data set which we create will be structured as follows:
yes? CANCEL VAR/ALL
yes? USE all_casts
yes? SHOW DATA
currently SET data sets:
1> ./all_casts.cdf
(default)
name title I J
K L
OUTVAR My merged var 1:10 ...
1:51
LAT Latitude 1:10 ...
...
Ch8 Sec3.3. Defining vertical sections from profiles
In the data set created above the profiles may or may not be ordered as needed to create a valid section. There are many possible ways to order the data. Often more than one technique is applicable to a single data set. The data may be ordered along a ship track, ordered by increasing latitude, ordered by path distance along a regression line, etc.
Continuing with the example above, we can order the profiles into increasing latitude with:
yes? let order = SORTI(lat)
yes? let section = SAMPLEI(outvar, order)
Other definitions of the variable order may be created by straightforward means to apply other ordering principles.
As defined above, "section" has an X axis which is the values 1, 2, 3,...N from the Ferret ABSTRACT axis. To cast this on a proper latitude axis, use these two steps:
yes? DEFINE AXIS/Y/UNITS=degrees yax_sect=SAMPLEI(lat, order)
yes? LET ysection
= RESHAPE(section,Y[gy=yax_sect]+Z[gz=all])
Ch8 Sec3.4. Visualization and analysis techniques for profile sections
The variables "section" and "ysection" defined above may be plotted and analyzed with the normal gridded plot commands. For examples,
yes? CONTOUR section ! contour plot ordered on X=1,2,3,...
yes? FILL ysection
! color contour plot on formatted latitude axis
yes? PLOT/Y=20S/Z=100:500
ysection ! profile at 20 south
yes? PLOT ysection[Z=@loc:20] ! depth
of 20 degree isotherm
Ch8 Sec3.5. Subsampling gridded fields onto profile coordinates
The technique described for sampling grids at scattered point values will work unmodified for collections of vertical profiles. The Z coordinate of the gridded variable will be retained unmodified throughout the sampling operations. Regrid the final result variable to other Z axes as desired.
Ch8 Sec4. COLLECTIONS OF TIME SERIES
Handling of collections of time series is analogous to handling collections of vertical profiles, described above. The choices of
1. a single interpolated time axis (using the default, GT=@LIN, regridding)
2. a super-set of all times axis (using "exact match," GT=@XACT, regridding)
should be considered. Choice 3, in which time would be handled as an independent variable, is possible, but awkward, due to the multiplicity of time encodings.
Ch8 Sec5. COLLECTIONS OF 2-DIMENSIONAL GRIDS
Handling collections of 2-dimensional grids (e.g. ZT grids from acoustic current profilers) is a straightforward extension of the techniques described under collections of profiles. If the time axes of the input grids are all identical, no additional work is needed beyond the techniques described there. If the time axes differ then follow the guidance given under Collections of Time Series, using intermediate variable definitions that reconcile the time axes into a single uniform axis before saving the input variables into a merged output file.
Lagrangian data (ship tracks, drifters, etc.) is a special case of scattered point data described in a preceding section. In the terminology of "Defining gridded variables from point data" Lagrangian data is simply point data organized onto a 1-dimensional time axis grid.
Ch8 Sec6.1. Visualization techniques for Lagrangian data
Ferret has several visualization tools that specifically address the needs of Lagrangian data. There are three scripts:
|
polymark (polymark_demo) |
marks value-colored symbol at each location |
|
polytube (polytube_demo) |
creates a line following the Lagrangian track with color varying according to a Lagrangian variable |
|
trackplot (trackplot_demo) |
creates a line plot of a Lagrangian variable where the zero line of the plot follows the Lagrangian track |
Overlays of the trackplot script are useful to visualize more than one variable. Run the demonstration scripts noted above for each tool for an example of its use with Lagrangian data.
Ch8 Sec7. SIGMA COORDINATE DATA
With sigma coordinate data the vertical coordinate (or layer thickness) is available as a dependent variable and the Z axis of the sigma-encoded variables is layer number (the Z index). This is precisely analogous to method 3 of handling collections of profiles, above. (p. 248). The family of ZAXREPLACE functions may be used to regrid this kind of data to a Z axis with physical units (p. 85)
See also the FAQ on Using Sigma Coordinates.
Ch8 Sec7.1. Visualization techniques for sigma coordinate data
Visualizations of sigma coordinate data in vertical section planes are best handled with the 3-argument versions of the SHADE, FILL, CONTOUR and VECTOR commands. See further information in Customizing Plots (, p. 183).
For visualization of sigma coordinate data in other planes or orientations use the techniques described in the next section.
Ch8 Sec7.2. Analysis techniques for sigma coordinate data
Analysis of sigma coordinate data, which requires shifting to depth or pressure coordinates, is facilitated by the function ZAXREPLACE, which converts from layer number to other vertical coordinate axes. See sigma_coordinate_demo.jnl for an example. If the data set provides layer thickness rather than depth a depth variable may be created using integration with @iin.
Ch8 Sec8. CURVILINEAR COORDINATE DATA
By "curvilinear coordinate data" we refer to data which is curvilinear in the XY plane there. We presume that the X,Y coordinates (typically longitude, latitude) are available through other dependent variables.
Here is an example showing a curvilinear grid, taken from http://www.wldelft.nl/rnd/intro/topic/2003-swe/.
Curvilinear data may be defined by a map projection (see p.224), or by data in a file that has a curvilinear grid. A curvilinear grid has longitudes and latitudes defined by coordinates (lon[i, j],lat[i, j]) in 2D, and the data fields are also defined on the [i,j] index grid. In the CF standard for netCDF files at http://www.cgd.ucar.edu/cms/eaton/cf-metadata/CF-1.0.html, thses grids are discussed in the section titled " Two-dimensional latitude, longitude coordinate variables". The netCDF header for a file containing data on a curvilnear grid looks like this (when viewed with the Unix command ncdump -h). Note how the coordinate variables lon and lat are 2-D fields. The coordinate variables and the data field tmp share a grid in index space.
dimensions:
xc = 180 ;
yc = 173 ;
variables:
float xc(xc) ;
xc:long_name
= "x-index in Cartesian system" ;
xc:units = "m" ;
float yc(yc) ;
yc:long_name = "y-index in Cartesian system" ;
yc:units = "m" ;
float
lon(yc,xc) ;
lon:long_name = "longitude" ;
lon:units = "degrees_east"
;
float lat(yc,xc) ;
lat:long_name = "latitude" ;
lat:units = "degrees_north"
;
tmp:long_name = "temperature" ;
tmp:units = "K" ;
tmp:coordinates
= "lon lat" ;
When such a dataset is opened in Ferret, the output of a SHOW DATA command will look like:
yes? show data
currently SET data sets:
1> ./tmp.nc (default)
name
title I J K L
TMP temperature 1:180 1:173
... ...
LON longitude 1:180 1:173 ... ...
LAT
longitude 1:180 1:173 ... ...
This data can be plotted with the 3-argument SHADE, FILL or CONTOUR commands
yes? SHADE tmp, lon, lat
You can see what the grid looks like by doing a shade plot of the coordinate variables:
yes? set view ul; shade lon
yes? set view ll; shade lat
Ch8 Sec8.1. Visualization techniques for curvilinear coordinate data
Visualizations of curvilinear coordinate data in the XY plane section planes are best handled with the 3-argument versions of the SHADE, FILL, and Contour commands. See further information in the chapter "Customizing Plots" (p. 183).
For visualization of curvilinear coordinate data in other planes or orientations use the techniques described under "Analysis techniques for curvilinear coordinate data."
Ch8 Sec8.2. Analysis techniques for curvilinear coordinate data
Analysis of curvilinear coordinate data may be done in the curvilinear coordinate system or in a rectilinear (including lat-long) coordinate system. If the analysis is done in the curvilinear coordinate system, it is the responsibility of the user to ensure that the proper geometric factors are applied when integrals and derivatives are computed. Converting other fields to the curvilinear coordinate system is most easily accomplished with the function RECT_TO_CURV. Curvilnear grids may be converted to rectilinear grids using the functions CURV_TO_RECT_MAP and CURV_TO_RECT.
By "polygonal data" we refer to a class of point data set where each point represents a polygonal region rather than a single coordinate. An example of polygonal data would be a value associated with each state in the United States.
Ch8 Sec9.1. Visualization techniques for polygonal data
Visualizations of polygonal data is best handled with the POLYGON command. If the coordinates of the polygon vertices are available in 2-dimensional arrays, XPOLY and YPOLY, in which the axes of the arrays are the polygon vertices and the sequence of polygons the use of the POLYGON command is straightforward. The POLYGON command can also handle sequences of polygons encoded in 1-dimensional arrays with missing values separating each polygon.
Ch8 Sec9.2. Analysis techniques for polygonal data
Ferret version 5.0 does not have any tools specifically addressing the analysis of polygonal data sets. The analysis of these data sets in Ferret requires the creation of a gridded mask field corresponding to the polygonal regions (an external function could be written that would create a gridded mask of arbitrary resolution from polygonal coordinates.)
Once the mask is created, the standard gridded operators for averaging, integrating, etc. can be used. For example, if variable cal_mask contains a gridded mask of the state of California on latitude and longitude axes of 10 minute resolution then this definition would compute the average of a gridded variable, var, over California:
yes? let cal_var = mask * var[g=mask]
yes? let cal_average = cal_var[x=@ave,
y=@ave]