Ferret accepts input data from both ASCII and binary files and recognizes two standardized, self-describing data formatsNetCDF, and TMAP. Network Common Data Format (NetCDF) is the suggested method of data storage.
SET DATA_SET or just SET DATA specifies a data set for access. ASCII and binary files can be read using SET DATA/EZ (also known as "FILE"). To unambiguously specify the format of a data set, include the extension .cdf or .des in its name, or use the qualifier /FORMAT=CDF.
To examine what each data set consists of (variables, grids, etc.) after specifying them with SET DATA, use SHOW DATA. This command displays the variables in the data set and over what geographical and time ranges they are defined.
Here is an example of Ferret's output:
yes? SET DATA coads_climatology
yes? SHOW DATA
currently SET data sets:
1> /home/e1/tmap/fer_dsets/descr/coads_climatology.des (default)
name
title I J K L
SST
SEA SURFACE TEMPERATURE 1:180 1:90 1:1 1:12
AIRT
AIR TEMPERATURE 1:180 1:90 1:1 1:12
SPEH
SPECIFIC HUMIDITY 1:180 1:90 1:1 1:12
WSPD
WIND SPEED 1:180 1:90 1:1 1:12
UWND ZONAL
WIND 1:180 1:90 1:1 1:12
VWND MERIDIONAL
WIND 1:180 1:90 1:1 1:12
SLP SEA LEVEL PRESSURE
1:180 1:90 1:1 1:12
If multiple data sets have been requested in a single Ferret session, the last requested will be the default data set. To specify other data sets, use the name of the data set or the number of the set as given by the SHOW DATA statement. For example:
yes? LIST/D=2 temp
will list the data for the variable "temp" in data set number 2 as displayed by SHOW DATA/BRIEF, while
yes? LIST temp[D=levitus_climatology] - temp[D=coads_climatology]
will list the differences between the variable "temp" in data set "levitus_climatology" and data set "coads_climatology."
Once a data set has been opened, you can find the data set name via the RETURN keyword (see p. 138):
yes? say `var,RETURN=dset`
yes? say `var,RETURN=dsetnum`
If a filename begins with a number, Ferret does not recoginze it, but the file may be specified using its unix pathname, e.g.
yes? use "./123"
or
yes? file/var=a "./45N_180W.dat"
The Network Common Data Format (NetCDF) is an interface to a library of data access routines for storing and retrieving scientific data. netCDFallows the creation of data sets which are self-describing and platform-independent. netCDFwas created under contract with the Division of Atmospheric Sciences of the National Scientific Foundation and is available from the Unidata Program Center in Boulder, Colorado (www.unidata.ucar.edu).
See the chapter "Converting Data to NetCDF" (p. 267), for a complete description of how to create netCDFdata sets or how to convert existing data sets into NetCDF.
To output a variable in NetCDF, simply use:
yes? LIST/FORMAT=CDF variable_name
LIST/FORMAT=CDF (alias SAVE) can also be used with abstract variables:
yes? SAVE/FILE=example.cdf/I=1:100 sin(I/100)
This will create a file named example.cdf.
The current region and data sets determine the variable names in the saved file and the range over which they are saved. Saved data can then be accessed as follows:
yes? USE example
(USE is an alias for SET DATA/FORMAT=CDF, see )
To read a netCDFdataset that is on a DODS server, simply specify the DODS address in quotes:
yes? use "http://ferret.pmel.noaa.gov/cgi-bin/nph-nc/data/coads_climatology.nc"
If a filename is not specified, Ferret will generate one. (See command SET LIST/FILE in the Commands Reference section, p. 407). An example of converting TMAP-formatted data to netCDFgoes as follows:
yes? SET DATA coads_climatology
yes? SAVE/L=1 sst,airt,uwnd,vwnd
These commands will save sst, airt, uwnd, and vwnd at the first time step over their entire regions to a netCDF file named by Ferret.
One advantage to using netCDF is that users on a different system (i.e., VMS instead of Unix) with different software (i.e., with an analysis tool other than Ferret) can share data easily without substantial conversion work. netCDF files are self-describing; with a simple command the size, shape and description of all variables, grids and axes can be seen.
Ch2 Sec2.1. NetCDF data and strides
Beginning with Ferret version 5.1 , the internal functioning of netCD Freads has been changed when "strides" are involved. Suppose that CDFVAR represent a variable from netCDFfile. In version 5.0 and earlier the command PLOT CDFVAR[L=1:1000:10] would have read the entire array of 1000 points from the file; Ferret's internal logic would have subsampled every 10th point from the resulting array in a manner that was consistent for netCDF variables, ASCII variables, user defined variables, etc. In V5.1 strides applied to netCDF variables are given special treatment -- subsampling is done by the netCDF library. The primary benefit of this is to make network access to remote data sets via DODS more efficient. Beginning with Ferret v5.4, strides can be applied across the "branch point" of a modulo variable without loss of efficiency for netCDF data set, as long as the stride is an integer fraction of the modulo length times the number of points on the axis. A remote satellite image of size, say, 1000x1000 points x 8 bit depth (8 megabytes) can efficiently be previewed using
SHADE DODS_VAR[i=1:1000:10,j=1:1000:10]
If a grid or axis from a netCDF file is used in the definition of a LET-defined variable (e.g. LET my_X = X[g=sst[D=coads_climatology]]) that variable definition will be invalidated when the data set is canceled (CANCEL DATA coads_climtology, in the preceding example). There is a single exception to this behavior: netCDF files such as climtological_axes.cdf, which define grids or axes that are not actually used by any variables. These grids and axes will remain defined even after the data set, itself, has been canceled. They may be deleted with explicit use of CANCEL GRID or CANCEL AXIS.
In Ferret version 6.02 we introduce a method whereby a grid may be redefined with strided axes. This "native stride" syntax means that the stride information needs to be specified only once, and variable names do not need to be changed.
Old syntax:
yes? SET DATA mydat.nc
yes? LET strided_var = var[i=1:1000:10,j=1:1000:10]
yes? FILL strided_var ! Use the new name strided_var everywhere.
New syntax:
yes? SET DATA mydat.nc
yes? SET AXIS/STRIDE=10 `var,RETURN=xaxis`
yes? SET
AXIS/STRIDE=10 `var,RETURN=yaxis`
yes? FILL var ! The original variable name can be used
An offset may be specified on the SET AXIS/STRIDE command with SET AXIS/STRIDE=/OFFSET=. The offset value must be less than the stride value, and it refers to the first index to use:
Old syntax
yes? SET DATA mydat.nc
yes? LET strided_var = var[i=4:1000:10]
New syntax:
yes? SET DATA mydat.nc
yes? SET AXIS/STRIDE=10/OFFSET=4 `var,RETURN=xaxis`
This syntax associates a new strided axis with the original axis. Everywhere that original axis is used, the new strided behavior will be applied. This means that all variables from all datasets that share the same exact axis will appear on the new strided axis. The original axis definition still exists and we can cancel the stride behavior with
yes? CANCEL AXIS/STRIDE axisname
Ch2 Sec2.2. NetCDF data attributes
Beginning with Ferret V6.0, Ferret has access to attributes of netCDF variables, including coordinate variables. In fact, attributes can be defined and used for user variables and variables from any kind of dataset. See the section in the next chapter about dataset and variable attributes (p. 65)
Ch2 Sec2.3. NetCDF Data with the bounds attribute
The CF standard for netCDF files defines a bounds attribute for coordinate axes, where the upper and lower bounds of the grid cells along an axis are specified by a bounds variable which is of size n*2 for an axis of length N. See Section 7.1 of the CF document
http://www.cgd.ucar.edu/cms/eaton/cf-metadata/CF-1.0.html
The coordinates on the axis may be anywhere within the cells defined by the upper and lower cell bounds. Ferret uses these as the upper and lower bounds of of axis cells (also known as boxes). They may be listed or otherwise accessed using the pseudo-variables XBOXLO, XBHOXH, YBOXLO, etc.
For example, a simple netCDF file with bounds would have the following ncdump output:
netcdf irrx {
dimensions:
XAX = 4 ;
bnds = 2 ;
variables:
double XAX(XAX) ;
XAX:point_spacing
= "uneven" ;
XAX:axis = "X" ;
XAX:bounds = "XAX_bnds" ;
double XAX_bnds(XAX,
bnds) ;
float V(XAX) ;
V:missing_value = -1.e+34f ;
V:_FillValue = -1.e+34f
;
V:long_name = "SEA SURFACE TEMPERATURE" ;
// global attributes:
:history
= "FERRET V5.60 4-Jun-04" ;
data:
XAX = 1, 2, 5, 6 ;
XAX_bnds =
0.,
1.5,
1.5, 2.5,
2.5, 5.5,
5.5, 7. ;
V =
28.20222,
28.36456, 28.35381,0
28.2165,
28.48889,
28.31556 ;
}
The CF standard allows for axes in a file that may have discontiguous bounds (the upper bound of one cell is not the same as the lower bound of the next cell). Ferret does not allow such an axis. When discontiguous bounds are encountered in a file, we arbitrarily choose to use the lower bounds throughout, with the upper bound of the topmost cell to close the definition. This way all axes have contiguous upper and lower bounds. A warning message is issued.
DEFINE AXIS/BOUNDS may be used to create an axis with cell bounds. All irregular axes are saved with a bounds attribute (beginning with Ferret v5.70) and the user may request that all axes be written with the bounds attribute with the SAVE/BOUNDS command
Note that if you have a dataset that has an irregular time axis and a bounds attribute on that axis and you force Ferret to apply a regular time axis with
yes? USE/REGULART my_data.nc
then the bounds are ignored: the regular time axis is formed from the first and last coordinate and the number of points.
Ch2 Sec2.4. Multi-file NetCDF data sets
Ferret supports collections of netCDF files that are regarded as a single netCDF data set. Such data sets are referred to as "MC" (multi CDF) data sets. They are particularly useful to manage the outputs of numerical models. MC data sets use a descriptor file, in the style of TMAP-formatted data sets. The data set is referred to inside Ferret by the name of this descriptor file.
A collection of netCDF files is suitable to form a multi-file data set if
1) The files are connected through their time axiseach file represents
one or more time snapshots of the variables it contains.
2) All non-time-dependent variables in the data set must be contained in the first file of the data set (or those variables will not appear in the merged, MC, data set).
Note that previous to version 5.2, each file is self-documenting with respect to the time axis of the variableseven if the time axis represents only a single point. (All of the time axes must be identically encoded with respect to units and date of the time origin.) In version 5.3 and higher these checks are not performed. This means that the MC descriptor mechanism can be used to associate into time series groups of files that are not internally self-documenting with respect to time. See Chapter 10, section 4 (p. 289)
Beginning with version 5.8 of Ferret the stepfiles may contain different scale and offset values for the variables they contain. (p. 275). Ferret reads and applies the scale and offset values as data from each stepfile is read. Note that the commands
yes? SAY `var, RETURN=nc_offset`
yes? SAY `var, RETURN=nc_scale`
return the latest scale and offset value that were applied.
A typical MC descriptor file may be found in the chapter "Converting to netCDF", in the section "Creating a multi-NetCDF data set." (p. 289)
Ch2 Sec2.5. Non-standard NetCDF data sets
As discussed in the Chapter, "Converting Data to NetCDF," (p. 267) Ferret expects netCDF files to adhere to the COARDS conventions (coop_cdf_profile.html). If the files do not adhere to the COARDS conventions, Ferret will still attempt to access them. Often, the user can use Ferret controls for regridding, reshaping, and otherwise transforming data to recover the intended file contents.
Here are a few common ways in which netCDF files may deviate from the COARDS standard and how one may cope with those situations in Ferret.
In the COARDS conventions an axis (a.k.a. "coordinate variable") must have
monotonically-increasing coordinate values. If the coordinates are disordered
or repeating in a netCDF file, then Ferret will present the coordinates
to the user (in SHOW DATA) as a dependent variable, whose name is the axis
name, and it will substitute an axis of the index values 1, 2, 3, ... Note
that Ferret will apply this same behavior when files have long irregular
axis definitions that exceed Ferret's axis memory capacity.
If the coordinates of an axis are monotonically decreasing, instead of
increasing, Ferret will transparently reverse both the axis coordinates
and the dependent variables that are defined upon that axis. Note that
if Ferret writes a reverse-ordered variable to a new netCDF file (with
the SAVE command), the coordinates and data in the output file will be
in monotonically increasing coordinate orderreversed from the input file.
If the values of a dependent variable are reversed, but there is no associated
coordinate axis then use attach a minus sign to the corresponding axis
orientation in the USE/ORDER= qualifier to designate that the variable(s)
should be reversed along the corresponding axis.
The COARDS standard specifies that variable names should begin with a letter
and be composed of letters, digits, and underscores. In files where the
variable names contain other letters, references to those variable names
in Ferret must be enclosed in single quotes.
The COARDS standard specifies that if any or all of the dimensions of a
variable have the interpretations of "date or time" (a.k.a. "T"), "height
or depth" (a.k.a. "Z"), "latitude" (a.k.a. "Y"), or "longitude" (a.k.a.
"X") then those dimensions should appear in the relative order T, then
Z, then Y, then X in the CDL definition corresponding to the file. In files
where the axis ordering has been permuted the command qualifiers USE/ORDER=
(Command Reference, p. 399) allow the user to inform Ferret of the correct
permutation of coordinates. Note that if Ferret writes a permuted variable
to a new netCDF file (with the SAVE command), the coordinates and data
in the output file will be in standard X-Y-Z-T ordering (as indicated in
the users /ORDER specification)permuted from the original file ordering.
See the Command Reference (p. 323) for a complete description of the ORDER
qualifier.
The COARDS standard specifies that a netCDF file may be created with more
than four dimensions. However the Ferret framework allows just four dimensions
at this time.
Ch2 Sec2.6. NetCDF and non-standard calendars
The netCDF conventions document discusses and defines usage for different calendar axes. hese conventions for calendars are implemented in Ferret version 5.3 See:
http://www.cgd.ucar.edu/cms/eaton/cf-metadata/CF-current.html#cal The calendars allowed are:
|
GREGORIAN or STANDARD (default) |
Ferret uses the proleptic Gregorian calendar, which is the Gregorian calendar extended to dates before 1582-10-15. |
|
NOLEAP or 365_DAY |
All years are 365 days long. |
|
NOLEAP or 365_DAY |
All years are 365 days long. |
|
ALL_LEAP or 366_DAYNL |
All years are 366 days long. |
|
360_DAY |
All years are 360 days divided into 30 day months. |
|
JULIAN |
Julian calendar; leap years with no adjustment at the turn of the century. |
These calendars are compatible with the Udunits standard which has slightly different naming conventions, except that the gregorian or standard calendar is a proleptic Gregorian calendar in Ferret. If the mixed Julian/Gregorian calendar is desired, use a time origin of 1-jan-0001:00:00 and Ferret will apply the 2-day shift that was made historically when the Gregorian calendar was introduced. The Udunits standard can be found at:
http://my.unidata.ucar.edu/content/software/udunits/udunits.txt
udunits_1998.dat (A local copy of the above link).
The netCDF conventions recommend that the calendar be specified by the attribute time:calendar which is assigned to the time coordinate variable when there is a non-Gregorian calendar associated with a data set, i.e.
time:calendar=noleap
Ferret reads this attribute when it is present in a netCDF file and assigns the appropriate calendar identifer to the variable. When a variable has a non-Gregorian calendar, the attribute is written to a netCDF file when the variable is output to a netCDF file.
As of Ferret version 2.30, netCDF is the suggested format for data storage (see the chapter, "Converting to netCDF," p. 267). This section describing TMAP information is included only for users who already work with data in TMAP format.
To access TMAP-formatted data sets use
SET DATA_SET TMAP_set1, TMAP_set2, ...
TMAP_setn must be the name of a descriptor file for a data set that is in TMAP "GT" (grids-at-timesteps) or "TS" (time series) format. ("Ferret" format and "TMAP" format are synonyms.)
If the directory portion of the filename is omitted the environment variable FER_DESCR will be used to provide a list of directories to search. The order of directories in FER_DESCR determines the order of directory searches. If the extension is omitted a default of ".des" will be assumed (if the filename has more than one period, the extension must be given explicitly).
For every TMAP-formatted data set there is a descriptor file containing summary information about the contents of the data set. This includes variable names, units, grids, and coordinates. When the command SET DATA_SET is given to Ferret pointing to a GT-formatted or TS-formatted data set, it is the name of the descriptor file that must be specified.
Ferret can read binary data files that are formatted with and without FORTRAN record length headers (binary files without FORTRAN record length formatting are also known as "stream" files).
Ch2 Sec4.1. FORTRAN-structured binary files
Files containing record length information are created by FORTRAN programs using the ACCESS="SEQUENTIAL" (the FORTRAN default) mode of file creation and also by Ferret using LIST/FORMAT=unf. Files that contain FORTRAN record length headers must have all data aligned on a 4-byte boundary. Suppose "rrrr" represents 4 bytes of record length information and "dddd" represents a 4-byte data value. Then FORTRAN-structured files are organized in one of the following two ways:
Ch2 Sec4.1.1. Records of uniform length
A FORTRAN-structured file with records of uniform length (3 single-precision floating point data values per record in this figure) looks like this:
rrrr dddd dddd dddd rrrr ...
FORTRAN code that creates a data file of this type might look something like this (sequential access is the default and need not be specified in the OPEN statement):
REAL VARI(10), VAR2(10), VAR3(10)
...
OPEN(UNIT=20,FORMAT='UNFORMATTED',ACCESS='SEQUENTIAL',FILE='MYFILE.DAT')
...
DO
10 I=1,10
WRITE (20) VAR1(I), VAR2(I), VAR3(I)
10 CONTINUE
....
To access data from this file, use
yes? SET DATA/EZ/FORMAT=UNF/VAR=var1,var2,var3/COL=3 myfile.dat or,
yes?
FILE/FORMAT=UNF/VAR=var1,var2,var3/COLUMNS=3 myfile.dat
This is very similar to accessing ASCII data with the addition of the /FORMAT=unf qualifier. The /COLUMNS= qualifier tells Ferret the number of data values per record. Although optional in the above example, this qualifier is required if the number of data values per record is greater than the number of variables being read (examples follow in section "ASCII Data").
Ch2 Sec4.1.2. Records of non-uniform length
A FORTRAN-structured file with variable-length records might look like this:
rrrr dddd dddd rrrr
rrrr dddd rrrr
rrrr dddd dddd dddd dddd rrrr
etc.
With care, it is possible to read a data file containing variable-length records which was created using the simplest unformatted FORTRAN OPEN statement and a single WRITE statement for each variable. Use /FORMAT=stream to read such files. Note that sequential access is the FORTRAN default and does not need to be specified in the OPEN statement:
REAL VAR1(1000), VAR2(500)
...
OPEN (UNIT=20, FORMAT="UNFORMATTED", FILE="MYFILE.DAT")
...
WRITE
(20) VAR1
WRITE (20) VAR2
....
Use the qualifier /SKIP to skip past the record length information (/SKIP arguments are in units of words), and define a grid which will not read past the data values. The /COLUMNS= qualifier can be used when reading multiple variables to specify the number of words separating the start of each variable:
yes? DEFINE AXIS/X=1:500:1 xaxis
yes? DEFINE GRID/X=XAXIS mygrid
yes? FILE/FORMAT=stream/SKIP=1003/GRID=mygrid/VAR=var2
myfile.dat
The argument 1003 is the sum of the 1000 data words in record 1, plus 2 words of record length information surrounding the data values in record 1 (variable var1), plus 1 word of record information preceding the data in record 2.
Ch2 Sec4.1.3. Fortran binary files, variables on different grids.
Some FORTRAN-structured files have multiple variables per record which do not share a common grid. An example would be one year of a global monthly field stored as twelve records like this:
rrrr year month field(360x180) rrrr
The data file size is (1+1+1+360*180+1)*12*4 = 3110592 bytes. Such a file cannot be read with the /FORMAT=unf qualifier but can be read with the /FORMAT=stream qualifier described in the next section. By including the /SWAP qualifier, this technique can be used to read files created on a machine with a different byte ordering.
The following commands will read this file and assign the data to the appropriate grid:
yes? ! Create an X axis for an entire record.
yes? DEFINE AXIS/X=1:`3+360*180+1`:1
binary_x
yes? DEFINE AXIS/T=1:12:1 binary_t
yes? DEFINE GRID/X=binary_x/T=binary_t
binary_g
yes? ! Read in everything.
yes? FILE/FORMAT=stream/G=binary_g/VAR=val
binary_file
! Create the grid for the data field.
yes? DEFINE AXIS/MODULO/X=0.5:359.5:1
1deg_x
yes? DEFINE AXIS/Y=-89.5:89.5:1 1deg_y
yes? DEFINE AXIS/T=15-jan-1999:15-dec-1999:1/UNITS=month
month_1999_t
yes? DEFINE GRID/X=1deg_x/Y=1deg_y/T=month_1999_t 1deg_1999_g
yes?
! Create a variable that uses this grid.
yes? LET dummy = x[GX=R_1deg_1999_g]
+ y[GY=R_1deg_1999_g] + t[GT=R_1deg_1999_g]
yes? ! Reshape the data portion
of val onto the data grid.
yes? LET field = RESHAPE(val[i=4:`3+360*180`],dummy)
Ch2 Sec4.2. Stream binary files
Files without embedded record length information are created by FORTRAN programs using ACCESS="DIRECT" in OPEN statements and by C programs using the C studio library. These files can contain a mix of integer and real numbers. The following types can be read from an unstructured file:
|
FORTRAN |
C |
Size in bytes |
|
INTEGER*1 |
char |
1 |
|
INTEGER*2 |
short |
2 |
|
INTEGER*4 |
int |
4 |
|
REAL*4 |
float |
4 |
|
REAL*8 |
double |
8 |
Ch2 Sec4.2.1. Simple stream files
Suppose "dddd" represents a 4-byte data value. Then a stream (or "direct access") binary file of FORTRAN REAL*4 or C floats is:
dddd dddd dddd dddd dddd dddd ...
The structure of the records is implied by the program accessing the data. FORTRAN code which generates a direct access binary file might look like this:
REAL*4 MYVAR(10,5)
...
C Use RECL=40 for machines that specify in bytes
OPEN(UNIT=20,
FILE="myfile.dat", ACCESS="DIRECT", RECL=10)
...
DO 100 j = 1, 5
100 WRITE
(20,REC=j) (MYVAR(i,j),i=1,10)
....
Use the following Ferret commands to read variable "myvar" from this file:
yes? DEFINE AXIS/X=1:10:1 x10
yes? DEFINE AXIS/Y=1:5:1 y5
yes? DEFINE GRID/X=x10/Y=y5
g10x5
yes? FILE/VAR=MYVAR/GRID=g10x5/FORMAT=stream myfile.dat
If the file consisted of a set of FORTRAN REAL*8 or C doubles, then the data would look like:
dddddddd dddddddd dddddddd ...
and the following Ferret commands would read the data into "myvar":
yes? DEFINE AXIS/X=1:10:1 x10
yes? DEFINE AXIS/Y=1:5:1 y5
yes? DEFINE GRID/X=x10/Y=y5 g10x5
yes? FILE/VAR=MYVAR/GRID=g10x5/FORMAT=stream/type=r8 myfile.dat
Note the addition of the "type" qualifier. See the FILE command (p. 357) for more details.
Since Ferret represents all variables as REAL*4, some precision is lost when reading in REAL*8 or INTEGER*4 values. Also, some REAL*8 numbers cannot be represented as REAL*4 numbers; the internal Ferret value of such a number is system dependent.
Ch2 Sec4.2.2. Mixed stream files
Ferret can read binary files that contain a mix of numbers of different type. However, a given Ferret variable can only be one type. Say you have a file containing a mix of REAL*8 and REAL*4 numbers:
dddddddd dddd dddddddd dddd dddddddd ...
The following would successfully read the file:
yes? FILE/VAR=MYDOUBLE,MYFLOAT/GRID=somegrid/FORMAT=stream/type=r8,r4 myfile.dat
while:
yes? FILE/VAR=MYDOUBLE/GRID=someothergrid/FORMAT=stream/type=r8,r4 myfile.dat
would fail.
Ch2 Sec4.2.3. Byte-swapped stream files
Stream files with byte-swapped numbers can be read with the /SWAP qualifier. Note that the /ORDER and /SKIP qualifiers are also available (see chapter "Data Set Basics", section "Reading ASCII files," p. 46, for more details on /ORDER and /SKIP).
To access ASCII data file sets use
yes? SET DATA/EZ ASCII_file_name or equivalently
yes? FILE ASCII_file_name
The following are qualifiers to SET DATA/EZ or FILE:
|
Qualifier |
Description |
|
/VARIABLES |
names the variables in the file |
|
/TITLE |
associates a title with the data set |
|
/GRID |
indicates multi-dimensional data and units |
|
/COLUMNS |
tells how many data values are in each record |
|
/FORMAT |
specifies the format of the records |
|
/SKIP |
skips initial records of the file |
|
/ORDER |
specifies order of axes (which varies fastest) |
Use command SET VARIABLE to individually customize the variables.
Ch2 Sec5.1. Reading ASCII files
Below are several examples of reading ASCII data properly. (Uniform record length, FORTRAN-structured binary data are read similarly with the addition of the qualifier /FORMAT= "unf". Seethe chapter on "Data Set Basics", section "Binary Data," p. 41, for other binary types). First, we look briefly at the relationship between Ferret and standard matrix notation.
Linear algebra uses established conventions in matrix notation. In a matrix A(i,j), the first index denotes a (horizontal) row and the second denotes a (vertical) column.
|
A11 |
A12 |
A13 |
... |
A1n |
|
|
A21 |
A22 |
A23 |
... |
A2n |
Matrix A(i,j) |
|
... |
|||||
|
Am1 |
Am2 |
Am3 |
... |
Amn |
X-Y graphs follow established conventions as well, which are that X is the horizontal axis (and in a geographical context, the longitude axis) and increases to the right, and Y is the vertical axis (latitude) and increases upward (Ferret provides the /DEPTH qualifier to explicitly designate axes where the vertical axis convention is reversed).
In Ferret, the first index of a matrix, i, is associated with the first index of an (x,y) pair, x. Likewise, j corresponds to y. Element Am2, for example, corresponds graphically to x=m and y=2.
By default, Ferret stores data in the same manner as FORTRANthe first index varies fastest. Use the qualifier /ORDER to alter this behavior. The following examples demonstrate how Ferret handles matrices.
Example 11 variable, 1 dimension
1a) Consider a data set containing the height of a plant at regular time intervals, listed in a single column:
2.3
3.1
4.5
5.6
. . .
To access, name, and plot this variable properly, use the commands
yes? FILE/VAR=height plant.dat
yes? PLOT height
1b) Now consider the same data, except listed in four columns:
2.3 3.1 4.5 5.6
5.7 5.9 6.1 7.2
. . .
Because there are more values per record (4) than variables (1), use:
yes? FILE/VAR=height/COLUMNS=4 plant4.dat
yes? PLOT height
Example 21 variable, 1 dimension, with a large number of data points.
The simple FILE command:
yes? FILE/VAR=height plant.dat
uses an abstract axis of fixed length, 20480 points. If your data is larger than that, you can read the data by defining an axis of appropriate length. Set the length to a number equal to or larger than the dimension of your data. The plot command will plot the actual number of points in the file.
yes? DEFINE AXIS/X/X=1:50000:1 longax
yes? DEFINE GRID/X=longax biggrid
yes? FILE/VAR=height/GRID=biggrid plant.dat
yes? PLOT height
Example 32 variables, 1 dimension
3a) Consider a data set containing the height of a plant and the amount of water given to the plant, measured at regular time intervals:
2.3 20.4
3.1 31.2
4.5 15.7
5.6 17.3
. . .
To read and plot this data use
yes? FILE/VAR="height,water" plant_wat.dat
yes? PLOT height,water
3b) The number of columns need be specified only if the number of columns exceeds the number of variables. If the data are in six columns
2.3 20.4 3.1 31.2 4.5 15.7
5.6 17.3 ...
use
yes? FILE/VAR="height,water"/COLUMNS=6 plant_wat6.dat
yes? PLOT height,water
Example 41 variable, 2 dimensions
4a) Consider a different situation: a greenhouse with three rows of four plants and a file with a single column of data representing the height of each plant at a single time (successive values represent plants in a row of the greenhouse):
3.1
2.6
5.4
4.6
3.5
6.1
. . .
If we want to produce a contour plot of height as a function of position in the greenhouse, axes will have to be defined:
yes? DEFINE AXIS/X=1:4:1 xplants
yes? DEFINE AXIS/Y=1:3:1 yplants
yes? DEFINE
GRID/X=xplants/Y=yplants gplants
yes? FILE/VAR=height/GRID=gplants greenhouse_plants.dat
yes?
CONTOUR height
When reading data the first index, x, varies fastest. Schematically, the data will be assigned as follows:
x=1 x=2 x=3 x=4
y=1 3.1 2.6
5.4 4.6
y=2 3.5 6.1 . . .
y=3 . . .
4b) If the file in the above example has, instead, 4 values per record:
3.1 2.6 5.4 4.6
3.5 6.1 . . .
then add /COLUMNS=4 to the FILE command:
yes? FILE/VAR=height/COLUMNS=4/GRID=gplants greenhouse_plants.dat
Example 52 variables, 2 dimensions
Like Example 3, consider a greenhouse with three rows of four plants each and a data set with the height of each plant and the length of its longest leaf:
3.1 0.54
2.6 0.37
5.4 0.66
4.6 0.71
3.5 0.14
6.1 0.95
.
.
. .
Again, axes and a grid must be defined:
yes? DEFINE AXIS/X=1:4:1 xht_leaf
yes? DEFINE AXIS/Y=1:3:1 Yht_leaf
yes?
DEFINE GRID/X=xht_leaf/Y=yht_leaf ght_leaf
yes? FILE/VAR="height,leaf"/GRID=ght_leaf
greenhouse_ht_lf.dat
yes? SHADE height
yes? CONTOUR/OVER leaf
The above commands create a color-shaded plot of height in the greenhouse, and overlay a contour plot of leaf length. Schematically, the data will be assigned as follows:
x=1 x=2 x=3 x=4
ht , lf ht ,
lf
y=1 3.1, 0.54 2.6, 0.37 5.4, 0.66 4.6, 0.71
y=2 3.5, 0.14
6.1, 0.95 . . .
y=3 . . .
Example 62 variables, 3 dimensions (time series)
Consider the same greenhouse with height and leaf length data taken at twelve different times. The following commands will create a three-dimensional grid and a plot of the height and leaf length versus time for a specific plant.
yes? DEFINE AXIS/X=1:4:1 xplnt_tm
yes? DEFINE AXIS/Y=1:3:1 yplnt_tm
yes?
DEFINE AXIS/T=1:12:1 tplnt_tm
yes? DEFINE GRID/X=xplnt_tm/Y=yplnt_tm/T=tplnt_tm
gplant2
yes? FILE/VAR="height,leaf"/GRID=gplant2 green_time.dat
yes? PLOT/X=3/Y=2
height, leaf
Example 71 variable, 3 dimensions, permuted order (vertical profile)
Consider a collection of oceanographic measurements made to a depth of 1000 meters. Suppose that the data file contains only a single variable, salt. Each record contains a vertical profile (11 values) of a particular x,y (long,lat) position. Supposing that successive records are successive longitudes, the data file would look as follows (assume the equivalencies are not in the file):
z=0 z=10 z=20 . . .
x=30W,y=5S 35.89 35.90 35.93 35.97 36.02 36.05 35.96 35.40 35.13 34.89 34.72
x=29W,y=5S 35.89 35.91 35.94 35.97 36.01 36.04 35.94 35.39 35.13 34.90 34.72
. . .
Use the qualifier /DEPTH= when defining the Z axis to indicate positive downward, and /ORDER when setting the data set to properly read in the permuted data:
yes? DEFINE AXIS/X=30W:25W:1/UNIT=degrees salx
yes? DEFINE AXIS/Y=5S:5N:1/UNIT=degrees
saly
yes? DEFINE AXIS/Z=0:1000:100/UNIT=meters/DEPTH salz
yes? DEFINE GRID/X=salx/Y=saly/Z=salz
salgrid
yes? FILE/ORDER=zxy/GRID=salgrid/VAR=sal/COL=11 sal.dat
Ch2 Sec5.2. Reading "DELIMITED" data files
SET DATA/FORMAT=DELIMITED[/DELIMITERS=][/TYPE=][/VAR=] filename
For "delimited" files, such as output of spreadsheets, SET DATA/FORMAT=DELIMITED initializes files of mixed numerical, string, and date fields. If the data types are not specified the file is analyzed automatically to determine data types.
The alias COLUMNS stands for "SET DATA/FORMAT=DELIMITED". (See p.402 for the full syntax.)
Example 1: Strings, latitudes, longitudes, and numeric data.
This file is delimited by commas. Some entries are null; they are indicated by two commas with no space between. File delimited_read_1.dat contains:
col1, col2 col3 col4 col5 col6 col7
one ,, 1.1, 24S,
130E ,, 1e1
two ,, 2.2, 24N, 130W, 2S
three ,,
3.3, 24, 130, 3N, 3e-2
five ,, 4.4, -24, -130, 91,
-4e2
extra line
If there is no /TYPE qualifier, the data type is automatically determined. If all entries in the column match a data type they are assigned that type. First let's try the file as is, using automatic analysis. Record 1 contains 5 column headings (text) so V1 through V5 are analyzed as text variables.
yes? FILE/FORMAT=delim delimited_read_1.dat
yes? LIST v1,v2,v3,v4,v5,v6,v7,v8
DATA SET: ./delimited_read_1.dat
X: 0.5 to 7.5
Column 1: V1
Column 2: V2
Column 3: V3
Column 4: V4
Column 5: V5
Column
6: V6
Column 7: V7
V1 V2 V3 V4 V5
V6 V7
1 / 1: "col1" "col2" "col3" "col4" "col5" " " ....
2
/ 2: "one" " " "1.1" "24S" "130E" " " 10.0
3 / 3: "two"
" " "2.2" "24N" "130W" "2S" ....
4 / 4: "three" " "
"3.3" "24" "130" "3N" 0.0
5 / 5: " " " " " "
" " " " " " ....
6 / 6: "five" " " "4.4" "-24" "-130"
"91" -400.0
7 / 7: "extra line" " " " " " " " " " " ....
Now skip the first record to do a better "analysis" of the file fields. Explicitly name the variables. Note that v3 is correctly analyzed as numeric, A4 is latitude and A5 longitude. A6 is analyzed as string data, because the value 91 in record 5 does not fall in the range for latitudes, and records 2 and 3 contain mixed numbers and letters.
yes? FILE/FORMAT=DELIM/SKIP=1/VAR="a1,a2,a3,a4,a5,a6,a7,a8,a9" delimited_read_
1.dat
yes? LIST a1,a2,a3,a4,a5,a6,a7
DATA SET: ./delimited_read_1.dat
X: 0.5 to 6.5
Column 1: A1
Column 2: A2 is A2 (all values
missing)
Column 3: A3
Column 4: A4 is A4 (degrees_north)(Latitude)
Column
5: A5 is A5 (degrees_east)(Longitude)
Column 6: A6
Column 7: A7
A1 A2 A3 A4 A5 A6 A7
1 / 1: "one"
... 1.100 -24.00 130.0 " " 10.0
2 / 2: "two" ... 2.200
24.00 -130.0 "2S" ....
3 / 3: "three" ... 3.300 24.00 130.0 "3N"
0.0
4 / 4: " " ... .... .... .... " " ....
5 / 5:
"five" ... 4.400 -24.00 -130.0 "91" -400.0
6 / 6: "extra line"...
.... .... .... " " ....
Now use the /TYPE qualifier to specify that all columns be treated as numeric.
yes? FILE/FORMAT=delim/SKIP=1/TYPE=numeric delimited_read_1.dat
yes? LIST
v1,v2,v3,v4,v5,v6,v7,v8
DATA SET: ./delimited_read_1.dat
X: 0.5 to 6.5
Column 1: V1
Column 2: V2
Column 3: V3
Column
4: V4
Column 5: V5
Column 6: V6
Column 7: V7
V1 V2 V3
V4 V5 V6 V7
1 / 1:...... 1.100 .... .... ....
10.0
2 / 2:...... 2.200 .... .... .... ....
3 / 3:...... 3.300
24.00 130.0 .... 0.0
4 / 4:...... .... .... .... ....
....
5 / 5:...... 4.400 -24.00 -130.0 91.00 -400.0
6 / 6:......
.... .... .... .... ....
Here is how to read only the first line of the file. If the variables are not specified, 7 variables are generated because auto-analysis of file doesn't stop at the first record. Use the command COLUMNS, the alias for FILE/FORMAT=delimited
yes? DEFINE AXIS/X=1:1:1 x1yes? DEFINE GRID/X=x1 g1
yes? COLUMNS/GRID=g1
delimited_read_1.dat
LIST v1,v2,v3,v4,v5,v6,v7
DATA SET: ./delimited_read_1.dat
X: 1
Column 1: V1
Column 2: V2
Column 3: V3
Column 4: V4
Column 5: V5
Column 6: V6
Column 7: V7
V1 V2 V3
V4 V5 V6 V7
I / *: "col1" "col2" "col3" "col4" "col5" " " ...
" "
Define the variables to read.
yes? COLUMNS/GRID=g1/VAR="c1,c2,c3,c4,c5" delimited_read_1.dat
yes? LIST
c1,c2,c3,c4,c5
DATA SET: ./delimited_read_1.dat
X: 1
Column 1: C1
Column 2: C2
Column 3: C3
Column 4: C4
Column 5:
C5
C1 C2 C3 C4 C5
I / *: "col1" "col2" "col3"
"col4" "col5"
Example 2: File using blank as a delimiter.
Ferret recognizes the file as containing date and time variables, further explored in Example 3 below. Here is the file delimited_read_2.dat. There is a record of many blanks in record 2.
1981/12/03 12:35:00
1895/2/6 13:45:05
Read the file using /DELIMITER=" "
yes? FILE/FORM=delimited/DELIMITER=" " delimited_read_2.dat
yes? LIST v1,v2
DATA SET: ./delimited_read_2.dat
X: 0.5 to 3.5
Column 1: V1 is V1 (days)(Julian days since 1-Jan-1900)
Column 2: V2
is V2 (hours)(Time of day)
V1 V2
1 / 1: 37965. 12.58
2
/ 2: .... ....
3 / 3: 39051. 13.75
Example 3: dates and times
Note that record 3 has syntax errors in the first 4 fields. Here is delimited_read_3.dat:
12/1/99, 12:00, 12/1/99, 1999-03-01, 12:00, 13:45:36.5
12/2/99, 01:00:13.5, 12/2/99, 1999-03-02, 01:00:13.5, 14:45:36.5
12/3/99x,
2:00x, 12/3/99, 1999-03-03, 2:00, 15:45
12/4/99, 03:00,
12/4/99, 1999-03-04, 03:00, 16:45:36.5
Read with auto-analysis. The records with syntax errors cause variables 1 and 2 to be read as string variables.
yes? COLUMNS delimited_read_3.dat
yes? LIST v1,v2,v3,v4,v5,v6
DATA SET: ./delimited_read_3.dat
X: 0.5 to 4.5
Column 1:
V1
Column 2: V2
Column 3: V3 is V3 (days)(Julian days since 1-Jan-1900)
Column 4: V4 is V4 (days)(Julian days since 1-Jan-1900)
Column 5: V5
is V5 (hours)(Time of day)
Column 6: V6 is V6 (hours)(Time of day)
V1 V2 V3 V4 V5 V6
1 / 1: "12/1/99"
"12:00" 36493. 36218. 12.00 13.76
2 / 2: "12/2/99" "01:00:13.5"
36494. 36219. 1.00 14.76
3 / 3: "12/3/99x" "2:00x" 36495.
36220. 2.00 15.75
4 / 4: "12/4/99" "03:00" 36496. 36221.
3.00 16.76
Use the date variables in v3 and v4 to define time axes. The date encodings are as expected.
yes? DEFINE AXIS/T/UNITS=days/T0=1-jan-1900 tax = v3
yes? SHOW AXIS tax
name axis # pts start end
TAX
TIME 4 r 01-DEC-1999 00:00 04-DEC-1999 00:00
T0 =
1-JAN-1900
yes? DEFINE AXIS/T/UNITS=days/T0=1-jan-1900 tax = v4
yes? SHOW
AXIS tax
name axis # pts start end
TAX TIME 4 r 01-MAR-1999 00:00 04-MAR-1999 00:00
T0
= 1-JAN-1900
Next we'll specify each column's type. Only the first two characters of the type are needed. Now we can read those columns which had errors, except for the record with the errors.
yes? COLUMNS/TYPE="da,ti,date, date, time, time" delimited_read_3.dat
yes?
LIST v1,v2,v3,v4,v5,v6
DATA SET: ./delimited_read_3.dat
X: 0.5 to 4.5
Column 1: V1 is V1 (days)(Julian days since 1-Jan-1900)
Column 2: V2 is V2 (hours)(Time of day)
Column 3: V3 is V3 (days)(Julian
days since 1-Jan-1900)
Column 4: V4 is V4 (days)(Julian days since 1-Jan-1900)
Column 5: V5 is V5 (hours)(Time of day)
Column 6: V6 is V6 (hours)(Time
of day)
V1 V2 V3 V4 V5 V6
1 / 1: 36493.
12.00 36493. 36218. 12.00 13.76
2 / 2: 36494. 1.00 36494. 36219.
1.00 14.76
3 / 3: .... .... 36495. 36220. 2.00 15.75
4 /
4: 36496. 3.00 36496. 36221. 3.00 16.76
Delimiters can be used to break up individual fields. Use both the slash and a comma (indicated by backslash and comma \,)
FILE/FORM=delim/DELIM="/,\," delimited_read_3.dat
LIST V1,V2,V3,V4,v5,v6
DATA SET: ./delimited_read_3.dat
X: 0.5 to 4.5
Column 1: V1
Column 2: V2
Column 3: V3
Column 4: V4
Column 5: V5
Column
6: V6
V1 V2 V3 V4 V5 V6
1 / 1:
12.00 1.000 "99" "12:00" 12.00 1.000
2 / 2: 12.00 2.000 "99"
"01:00:13.5" 12.00 2.000
3 / 3: 12.00 3.000 "99x" "2:00x" 12.00
3.000
4 / 4: 12.00 4.000 "99" "03:00" 12.00 4.000
Ch2 Sec6. TRICKS TO READING BINARY AND ASCII FILES
Since binary and ASCII files are found in a bewildering variety of non-standardized formats a few tricks may help with reading difficult cases.
Unix commands:
ln -s my_data my_dat.v1
ln -s my_data my_dat.v2
ln -s my_data my_dat.v3
Ferret commands:
yes? FILE/SKIP=0/VAR=v1 my_dat.v1
yes? FILE/SKIP=100/VAR=v2 my_dat.v2
yes?
FILE/SKIP=200/VAR=v3 my_dat.v3
Ch2 Sec7. ACCESS TO REMOTE DATA SETS WITH DODS
DODS is now called OPeNDAP; we continue to refer to it as DODS in this manual for now. DODS, the Distributed Oceanographic Data System, allows users to access data anywhere from the Internet using a variety of client/server methods, including Ferret. Employing technology similar to that used by the World Wide Web, DODS and Ferret create a powerful tool for the retrieval, sampling, analyzing and displaying of datasets; regardless of size or data format (though there are data format limitations).
For more information, please see the OPeNDAPhome page at
Similar to the WWW, DODS is an emerging technology and is under development. As a result, it is likely that the details with which things are accomplished will be changing.
Ch2 Sec7.2. Accessing Remote Data Sets
Datasets are accessed through Ferret using their raw Universal Resource Locator (URL) address. Make sure to enclose the address in quotes, as for any dataset name that includes a path. For example, to access the COADS climatology, hosted at PMEL:
yes? use "http://ferret.pmel.noaa.gov/cgi-bin/nph-nc/data/coads_climatology.nc"
Once the dataset has been initialized, it is used just like any other local dataset.
yes? list/x=140w/y=2n/t="16-Feb" sst
SEA SURFACE TEMPERATURE
(Deg C)
LONGITUDE: 141W
LATITUDE: 1N
TIME:
15-FEB 16:29
DATA SET: http://ferret.pmel.noaa.gov/cgi-bin/nph-nc/data/coads_climatology.nc
26.39
To locate DODS data, you can search the NVODS /DODS List of DODS datasets at http://www.opendap.org/data/datasets.cgi?&exfunction=none&xmlfilename=datasets.xml or the Global Change Master Directory at http://gcmd.gsfc.nasa.gov/
For the time being, netCDF and HDF files can be read via DODS by Ferret. As DODS (OPeNDAP) netCDF libraries become available, other data types will be made available.
Note that HDF files can be read by Ferret only via DODS, that is the HDF file must first be put on a DODS server and then Ferret can access it by giving its DODS URL. Even by this means, Ferret will be successful in reading the file only if the HDF file is similar in its structure to a COARDS or CF netCDF file. Often, you will need to apply the USE/ORDER= qualifier to change the ordering of the coordinate axes.
If a file is on a DODS server, you can look at the DAS in your browser (the URL that ends in .das). When you look at the attribute data check to see if there are dimension variables with attributes that look like a Latitude or Longitude as in the COARDS conventions.
Ch2 Sec7.3. Debugging Access to Remote DODS Data Sets
To find out more information about a particular dataset, or to debug problems, there are three elements of the dataset which may be accessed via a web browser. To access this information, merely append a dds, das, or info to the dataset name. For example:
http://ferret.pmel.noaa.gov/cgi-bin/nph-nc/data/coads_climatology.nc.dds
DDS stands for Data Description Structure and this will return a text description of the data sets structure.
http://ferret.pmel.noaa.gov/cgi-bin/nph-nc/data/coads_climatology.nc.das
DAS stands for Dataset Attribute Structure and this will return a text description of attributes assigned to the variables in the data set.
http://ferret.pmel.noaa.gov/cgi-bin/nph-nc/data/coads_climatology.nc.info
This will return a text description of the variables in the dataset.
Some DODS data providers will choose to control access to some or all of their data. When you request data from one of these servers, the DODS client will prompt you for a username and password. If you want to avoid the prompt, you can embed a username and password in it, like this:
http://user:password@www.dods.org/nph-dods/etc...
Ch2 Sec7.5. Sharing Data Sets via DODS
One of the most powerful aspect of DODS is the ease with which it allows for the sharing of data. With just a few simple steps, anyone running a web server can also be a DODS data server, thereby allowing data set access to anyone with an Internet connection.
Simply copying a few precompiled binaries into the cgi-bin directory of an already configure httpd server is all it takes to become a DODS server. Once the server is configured, adding or removing data sets is as simple as copying them to the server data directory or deleting them from that directory.
This ability has such immense potential that it bears extra emphasis. Imagine that within seconds of finishing a model run, a remote colleague is able to look at your dataset with whatever DODS client he/she desires, be it Ferret, or Matlab, etc. No need for you to package up the data or for your colleague to download and/or reformat it, it is ready to be analyzed right away.
This feature allows caching of frequently accessed DODS served datasets
to produce a quicker response when requesting remote data. The first time
you access a DODS data set, a file in the users home directory will be
created called .dodsrc. This file is the DODS client initialization file.
Please see the DODS Users Guide; http://www.opendap.org/user/guide-html/guide.html/guide_72.html for details of the paramaters that this
file contains. Initially, DODS caching will be turned off. In order to
turn caching on, change the line in the newly created ~/.dodsrc file from
USE_CACHE=0
to
USE_CACHE=1
Note that if you edit the .dodsrc file, make sure that the lines within it all start in the first column.
The next time Ferret is run, and a DODS-served dataset is accessed, a file called .dods_cache will be created, typically in the users home directory. The location of the DODS cache directory can be controlled by the line
CACHE_ROOT=/home/twaits/.dods_cache/
in the user's .dodsrc file. This directory is where all the cached information is stored. To clear the DODS cache, simply delete the .dods_cache directory and all of it's contents (for example, rm -rf ~/.dods_cache). This directory will be recreated and repopulated with caching information the next time data is accessed via DODS, if caching is turned on. All of the paramater values in the .dodsrc file can be modified to better suit individual needs, and will be incorporated the next time Ferret is run and DODS served data is accessed. Again, see the DODS User guide at see the section "The OPeNDAP Client Initialization File (.dodsrc)" in the DODS Users Guide (http://www.opendap.org/user/guide-html/guide.html) for more detailed information
It is often a useful diagnostic exercise to turn caching off and/or clear out the cache directory when attempts to access datasets in Ferret appear inconsistent. For example, if Ferret attempted to access a DODS-served dataset that was unavailable because the DODS server was down, that information may get cached and adversely effect the next attempt at retrieving the data.
For more detailed information on using DODS, and on setting up a DODS server, see the DODS home page (http://unidata.ucar.edu/packages/dods).
A DODS client can negotiate proxy servers, with help from directions in its configuration file. The parameters that control proxy behavior are fully documented in the DODS Users Guide, see the link above.
For help with Ferret see our Support Policy