To access ASCII data file sets use
yes? SET DATA/EZ ASCII_file_name yes? ! or equivalently yes? FILE ASCII_file_name
The following are qualifiers to SET DATA/EZ or FILE:
Qualifier | Description |
/VARIABLES |
names the variables in the file |
/TITLE |
associates a title with the data set |
/GRID |
indicates multi-dimensional data and units |
/COLUMNS |
tells how many data values are in each record |
/FORMAT |
specifies the format of the records |
/SKIP |
skips initial records of the file |
/ORDER |
specifies order of axes (which varies fastest) |
Use command SET VARIABLE to individually customize the variables.
NOTE that with the FILE or SET DATA/EZ command, there is an upper limig on the number of variables that can be read in free format from a single ASCII file. The SET DATA/FORM=DELIMITED command allows for 100 variables per file.
When reading from ascii, the entire grid and all variables will be read when the data is requested. To read subsets of the data, define a smaller grid to read a subset of records and perhaps write that to a netCDF file, then do a second read, skipping those first records, and so on.
Below are several examples of reading ASCII data properly. (Uniform record length, FORTRAN-structured binary data are read similarly with the addition of the qualifier /FORMAT= "unf". See the chapter on "Data Set Basics", section "Binary Data", for other binary types). First, we look briefly at the relationship between Ferret and standard matrix notation.
Linear algebra uses established conventions in matrix notation. In a matrix A(i,j), the first index denotes a (horizontal) row and the second denotes a (vertical) column.
A11 |
A12 |
A13 |
... |
A1n |
|
A21 |
A22 |
A23 |
... |
A2n |
Matrix A(i,j) |
... |
|||||
Am1 |
Am2 |
Am3 |
... |
Amn |
X-Y graphs follow established conventions as well, which are that X is the horizontal axis (and in a geographical context, the longitude axis) and increases to the right, and Y is the vertical axis (latitude) and increases upward (Ferret provides the /DEPTH qualifier to explicitly designate axes where the vertical axis convention is reversed).
In Ferret, the first index of a matrix, i, is associated with the first index of an (x,y) pair, x. Likewise, j corresponds to y. Element Am2, for example, corresponds graphically to x=m and y=2.
By default, Ferret stores data in the same manner as FORTRAN—the first index varies fastest. Use the qualifier /ORDER to alter this behavior. The following examples demonstrate how Ferret handles matrices.
Example 1, 1 variable, 1 dimension
1a) Consider a data set containing the height of a plant at regular time intervals, listed in a single column:
2.3 3.1 4.5 5.6 . . .
To access, name, and plot this variable properly, use the commands
yes? FILE/VAR=height plant.dat yes? PLOT height
However the data should be on a time axis. The above gives us a simple first look at the data, but we want to plot it as a time series. First define the time axis and a grid consisting of that axis, and define the data on that grid. See examples below for multi-dimensional data.
yes? DEFINE AXIS/T=1-JUN-2014:15-JUL-2014:1/UNITS=days/T0=1-JUN-2014 tdays yes? DEFINE GRID/T=tdays tgrid yes? FILE/VAR=height/GRID=tgrid plant.dat yes? PLOT height
1b) Now consider the same data, except listed in four columns:
2.3 3.1 4.5 5.6 5.7 5.9 6.1 7.2 . . .
Because there are more values per record (4) than variables (1), use:
yes? FILE/VAR=height/COLUMNS=4 plant4.dat yes? PLOT height
Example 2‚ 1 variable, 1 dimension, with a large number of data points.
The simple FILE command lets Ferret set up a grid.
yes? FILE/VAR=height plant.dat
Beginning with Ferret v6.93, if the FILE command is given without a grid, Ferret determines the size of the file and defines an axis and grid on which to read the variables. It takes any /skip and /columns qualifiers into account. This means that `VAR,RETURN=isize` for variables in the dataset return the correct result.
[Prior to Ferret v6.93, when used on its own, SET DATA/EZ/VAR= uses an abstract axis of fixed length, 20480 points. If your data is larger than that, you can read the data by defining an axis of appropriate length. Set the length to a number equal to or larger than the dimension of your data. The plot command will plot the actual number of points in the file.]
yes? DEFINE AXIS/X/X=1:50000:1 longax yes? DEFINE GRID/X=longax biggrid yes? FILE/VAR=height/GRID=biggrid plant.dat yes? PLOT height
Example 3, 2 variables, 1 dimension
3a) Consider a data set containing the height of a plant and the amount of water given to the plant, measured at regular time intervals:
2.3 20.4 3.1 31.2 4.5 15.7 5.6 17.3 . . .
To read and plot this data use
yes? FILE/VAR="height,water" plant_wat.dat yes? PLOT height,water
3b) The number of columns need be specified only if the number of columns exceeds the number of variables. If the data are in six columns
2.3 20.4 3.1 31.2 4.5 15.7 5.6 17.3 ...
use
yes? FILE/VAR="height,water"/COLUMNS=6 plant_wat6.dat yes? PLOT height,water
Example 4‚ 1 variable, 2 dimensions
In addition to the example below, see the FAQ, Reading ASCII data representing gridded data, for ascii files which have coordinate data listed in the file.
4a) Consider a different situation: a greenhouse with three rows of four plants and a file with a single column of data representing the height of each plant at a single time (successive values represent plants in a row of the greenhouse):
3.1 2.6 5.4 4.6 3.5 6.1 . . .
If we want to produce a contour plot of height as a function of position in the greenhouse, axes will have to be defined:
yes? DEFINE AXIS/X=1:4:1 xplants yes? DEFINE AXIS/Y=1:3:1 yplants yes? DEFINE GRID/X=xplants/Y=yplants gplants yes? FILE/VAR=height/GRID=gplants greenhouse_plants.dat yes? CONTOUR height
When reading data the first index, x, varies fastest. Schematically, the data will be assigned as follows:
x=1 x=2 x=3 x=4 y=1 3.1 2.6 5.4 4.6 y=2 3.5 6.1 . . . y=3 . . .
4b) If the file in the above example has, instead, 4 values per record:
3.1 2.6 5.4 4.6 3.5 6.1 . . .
then add /COLUMNS=4 to the FILE command:
yes? FILE/VAR=height/COLUMNS=4/GRID=gplants greenhouse_plants.dat
Example 5‚ 2 variables, 2 dimensions
Like Example 3, consider a greenhouse with three rows of four plants each and a data set with the height of each plant and the length of its longest leaf:
3.1 0.54 2.6 0.37 5.4 0.66 4.6 0.71 3.5 0.14 6.1 0.95 . . . .
Again, axes and a grid must be defined:
yes? DEFINE AXIS/X=1:4:1 xht_leaf yes? DEFINE AXIS/Y=1:3:1 Yht_leaf yes? DEFINE GRID/X=xht_leaf/Y=yht_leaf ght_leaf yes? FILE/VAR="height,leaf"/GRID=ght_leaf greenhouse_ht_lf.dat yes? SHADE height yes? CONTOUR/OVER leaf
The above commands create a color-shaded plot of height in the greenhouse, and overlay a contour plot of leaf length. Schematically, the data will be assigned as follows:
x=1 x=2 x=3 x=4 ht , lf ht , lf y=1 3.1, 0.54 2.6, 0.37 5.4, 0.66 4.6, 0.71 y=2 3.5, 0.14 6.1, 0.95 . . . y=3 . . .
Example 6‚ 2 variables, 3 dimensions (time series)
Consider the same greenhouse with height and leaf length data taken at twelve different times. The following commands will create a three-dimensional grid and a plot of the height and leaf length versus time for a specific plant.
yes? DEFINE AXIS/X=1:4:1 xplnt_tm yes? DEFINE AXIS/Y=1:3:1 yplnt_tm yes? DEFINE AXIS/T=1:12:1 tplnt_tm yes? DEFINE GRID/X=xplnt_tm/Y=yplnt_tm/T=tplnt_tm gplant2 yes? FILE/VAR="height,leaf"/GRID=gplant2 green_time.dat yes? PLOT/X=3/Y=2 height, leaf
Example 7‚ 1 variable, 3 dimensions, permuted order (vertical profile)
Consider a collection of oceanographic measurements made to a depth of 1000 meters. Suppose that the data file contains only a single variable, salt. Each record contains a vertical profile (11 values) of a particular x,y (long,lat) position. Supposing that successive records are successive longitudes, the data file would look as follows (assume the equivalencies are not in the file):
z=0 z=10 z=20 . . . x=30W,y=5S 35.89 35.90 35.93 35.97 36.02 36.05 35.96 35.40 35.13 34.89 34.72 x=29W,y=5S 35.89 35.91 35.94 35.97 36.01 36.04 35.94 35.39 35.13 34.90 34.72 . . .
Use the qualifier /DEPTH= when defining the Z axis to indicate positive downward, and /ORDER when setting the data set to properly read in the permuted data:
yes? DEFINE AXIS/X=30W:25W:1/UNIT=degrees salx yes? DEFINE AXIS/Y=5S:5N:1/UNIT=degrees saly yes? DEFINE AXIS/Z=0:1000:100/UNIT=meters/DEPTH salz yes? DEFINE GRID/X=salx/Y=saly/Z=salz salgrid yes? FILE/ORDER=zxy/GRID=salgrid/VAR=sal/COL=11 sal.dat
See also:
EXPNDI_BY_T function
These functions take 1-D lists of data and put them onto an X-Z or X-T grid, so that a 1-D list which represents a set of profiles or a set of time series may be defined with the correct Z or T axes.
2.5.2 Reading "DELIMITED" data files
SET DATA/EZ/FORMAT=DELIMITED[/DELIMITERS=][/TYPE=][/VAR=] filename
For "delimited" files, such as comma-delimited or tab-delimited output of spreadsheets, SET DATA/EZ/FORMAT=DELIMITED initializes files of mixed numerical, string, and date fields. If the data types are not specified the file is analyzed automatically to determine data types.
The alias COLUMNS stands for "SET DATA/FORMAT=DELIMITED".
The number of variables that can be read from a single file is 100 (Using the simple ascii reading commands we are limited to 20 variables).
Example 1: Strings, latitudes, longitudes, and numeric data.
This file is delimited by commas. Some entries are null; they are indicated by two commas with no space between. File delimited_read_1.dat contains:
col1, col2 col3 col4 col5 col6 col7 one ,, 1.1, 24S, 130E ,, 1e1 two ,, 2.2, 24N, 130W, 2S three ,, 3.3, 24, 130, 3N, 3e-2 five ,, 4.4, -24, -130, 91, -4e2 extra line
If there is no /TYPE qualifier, the data type is automatically determined. If all entries in the column match a data type they are assigned that type. First let's try the file as is, using automatic analysis. Record 1 contains 5 column headings (text) so V1 through V5 are analyzed as text variables.
yes? FILE/FORMAT=delim delimited_read_1.dat yes? LIST v1,v2,v3,v4,v5,v6,v7,v8,v9,v10 DATA SET: ./delimited_read_1.dat X: 0.5 to 7.5 Column 1: V1 Column 2: V2 Column 3: V3 Column 4: V4 Column 5: V5 Column 6: V6 Column 7: V7 Column 8: V8 Column 9: V9 Column 10: V10 V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 1 / 1: "col1" "col2" "col3" "col4" "col5" " " .... " " " " .... 2 / 2: "one" " " "1.1" "24S" "130E" " " 10.0 "word 1" " " .... 3 / 3: "two" " " "2.2" "24N" "130W" "2S" .... "word 2" " " .... 4 / 4: "three" " " "3.3" "24" "130" "3N" 0.0 " wd 3 " " " .... 5 / 5: " " " " " " " " " " " " .... " " " " .... 6 / 6: "five" " " "4.4" "-24" "-130" "91" -400.0 "word 4" "aa" 77.00 7 / 7: "extra line" " " " " " " " " " " .... " " " " ....
Now skip the first record to do a better "analysis" of the file fields. Explicitly name the variables. Note that v3 is correctly analyzed as numeric, A4 is latitude and A5 longitude. A6 is analyzed as string data, because the value 91 in record 5 does not fall in the range for latitudes, and records 2 and 3 contain mixed numbers and letters.
yes? FILE/FORMAT=DELIM/SKIP=1/VAR="a1,a2,a3,a4,a5,a6,a7,a8,a9" delimited_read_ 1.dat yes? LIST a1,a2,a3,a4,a5,a6,a7 DATA SET: ./delimited_read_1.dat X: 0.5 to 6.5 Column 1: A1 Column 2: A2 is A2 (all values missing) Column 3: A3 Column 4: A4 is A4 (degrees_north)(Latitude) Column 5: A5 is A5 (degrees_east)(Longitude) Column 6: A6 Column 7: A7 A1 A2 A3 A4 A5 A6 A7 1 / 1: "one" ... 1.100 -24.00 130.0 " " 10.0 2 / 2: "two" ... 2.200 24.00 -130.0 "2S" .... 3 / 3: "three" ... 3.300 24.00 130.0 "3N" 0.0 4 / 4: " " ... .... .... .... " " .... 5 / 5: "five" ... 4.400 -24.00 -130.0 "91" -400.0 6 / 6: "extra line"... .... .... .... " " ....
Now use the /TYPE qualifier to specify that all columns be treated as numeric.
yes? FILE/FORMAT=delim/SKIP=1/TYPE=numeric delimited_read_1.dat yes? LIST v1,v2,v3,v4,v5,v6,v7,v8 DATA SET: ./delimited_read_1.dat X: 0.5 to 6.5 Column 1: V1 Column 2: V2 Column 3: V3 Column 4: V4 Column 5: V5 Column 6: V6 Column 7: V7 Column 8: V8 V1 V2 V3 V4 V5 V6 V7 V8 1 / 1:...... 1.100 .... .... .... 10.0... 2 / 2:...... 2.200 .... .... .... ....... 3 / 3:...... 3.300 24.00 130.0 .... 0.0... 4 / 4:...... .... .... .... .... ....... 5 / 5:...... 4.400 -24.00 -130.0 91.00 -400.0... 6 / 6:...... .... .... .... .... .......
Here is how to read only the first line of the file. If the variables are not specified, 7 variables are generated because auto-analysis of file doesn't stop at the first record. Use the command COLUMNS, the alias for FILE/FORMAT=delimited
yes? DEFINE AXIS/X=1:1:1 x1yes? DEFINE GRID/X=x1 g1 yes? COLUMNS/GRID=g1 delimited_read_1.dat yes? list v1,v2,v3,v4,v5,v6,v7 DATA SET: ./bn_delimited_read_1.dat X: 1 Column 1: V1 Column 2: V2 Column 3: V3 Column 4: V4 Column 5: V5 Column 6: V6 Column 7: V7 V1 V2 V3 V4 V5 V6 V7 I / *: "col1" "col2" "col3" "col4" "col5" " "...
Define the variables to read.
yes? COLUMNS/GRID=g1/VAR="c1,c2,c3,c4,c5" delimited_read_1.dat yes? LIST c1,c2,c3,c4,c5 DATA SET: ./delimited_read_1.dat X: 1 Column 1: C1 Column 2: C2 Column 3: C3 Column 4: C4 Column 5: C5 C1 C2 C3 C4 C5 I / *: "col1" "col2" "col3" "col4" "col5"
Example 2: File using blank as a delimiter.
Ferret recognizes the file as containing date and time variables, further explored in Example 3 below. Here is the file delimited_read_2.dat. There is a record of many blanks in record 2.
1981/12/03 12:35:00
1895/2/6 13:45:05
Read the file using /DELIMITER=" "
yes? FILE/FORM=delimited/DELIMITER=" " delimited_read_2.dat yes? LIST v1,v2 DATA SET: ./delimited_read_2.dat X: 0.5 to 3.5 Column 1: V1 is V1 (days)(Julian days since 1-Jan-1900) Column 2: V2 is V2 (hours)(Time of day) V1 V2 1 / 1: 37965. 12.58 2 / 2: .... .... 3 / 3: 39051. 13.75
More on delimiters: Any ASCII character can be used as a delimiter. If letters are used they are case-sensitive.
So if the file d2.dat contains
1234.44H59.23# -8.9
1111.23H59.23# 2.3
123.79H59.23#56.1
yes? FILE/FORM=delim/DELIMITERS="H,#"/VAR="v1,v2,v3" d2.dat yes? list/PRECISION=6 v1,v2,v3 DATA SET: ./d2.dat X: 0.5 to 3.5 Column 1: V1 Column 2: V2 Column 3: V3 V1 V2 V3 1 / 1: 1234.44 59.2300 -8.9000 2 / 2: 1111.23 59.2300 2.3000 3 / 3: 123.79 59.2300 56.1000
Example 3: dates and times
see the next example for reading date-and-time as a single field
Here is delimited_read_3.dat:
12/1/99, 12:00, 12/1/99, 1999-03-01, 12:00, 13:45:36.5 12/2/99, 01:00:13.5, 12/2/99, 1999-03-02, 01:00:13.5, 14:45:36.5 12/3/99, 02:00, 12/3/99, 1999-03-03, 2:00, 15:45 12/4/99, 03:00, 12/4/99, 1999-03-04, 03:00, 16:45:36.5
Read it with auto-detection of the field types:
yes? COLUMNS delimited_read_3.dat yes? show data currently SET data sets: 1> ./delimited_read_3.dat (default) name title I J K L V1 V1 1:4 ... ... ... (Julian days since 1-Jan-1900) V2 V2 1:4 ... ... ... (Time of day) V3 V3 1:4 ... ... ... (Julian days since 1-Jan-1900) V4 V4 1:4 ... ... ... (Julian days since 1-Jan-1900) V5 V5 1:4 ... ... ... (Time of day) V6 V6 1:4 ... ... ... (Time of day)
What if there are errors in the first 4 fields? Here is delimited_read_3A.dat with errors in record 3:
12/1/99, 12:00, 12/1/99, 1999-03-01, 12:00, 13:45:36.5 12/2/99, 01:00:13.5, 12/2/99, 1999-03-02, 01:00:13.5, 14:45:36.5 12/3/99x, 2:00x, 12/3/99, 1999-03-03, 2:00, 15:45 12/4/99, 03:00, 12/4/99, 1999-03-04, 03:00, 16:45:36.5
Read with auto-analysis. The records with syntax errors cause variables 1 and 2 to be read as string variables.
yes? COLUMNS delimited_read_3A.dat yes? LIST v1,v2,v3,v4,v5,v6 DATA SET: ./delimited_read_3A.dat X: 0.5 to 4.5 Column 1: V1 Column 2: V2 Column 3: V3 is V3 (days)(Julian days since 1-Jan-1900) Column 4: V4 is V4 (days)(Julian days since 1-Jan-1900) Column 5: V5 is V5 (hours)(Time of day) Column 6: V6 is V6 (hours)(Time of day) V1 V2 V3 V4 V5 V6 1 / 1: "12/1/99" "12:00" 36493. 36218. 12.00 13.76 2 / 2: "12/2/99" "01:00:13.5" 36494. 36219. 1.00 14.76 3 / 3: "12/3/99x" "2:00x" 36495. 36220. 2.00 15.75 4 / 4: "12/4/99" "03:00" 36496. 36221. 3.00 16.76
Use the date variables in v3 and v4 to define time axes.
yes? DEFINE AXIS/T/UNITS=days/T0=1-jan-1900 tax = v3 yes? SHOW AXIS tax name axis # pts start end TAX TIME 4 r 01-DEC-1999 00:00 04-DEC-1999 00:00 T0 = 1-JAN-1900 Axis span (to cell edges) = 4 yes? DEFINE AXIS/T/UNITS=days/T0=1-jan-1900 tax = v4Replacing definition of axis TAX yes? SHOW AXIS tax name axis # pts start end TAX TIME 4 r 01-MAR-1999 00:00 04-MAR-1999 00:00 T0 = 1-JAN-1900 Axis span (to cell edges) = 4
Next we'll specify each column's type. Only the first two characters of the type are needed. Now we can read those columns which had errors, except for the record with the errors.
yes? COLUMNS/TYPE="da,ti,date, date, time, time" delimited_read_3A.dat yes? LIST v1,v2,v3,v4,v5,v6 DATA SET: ./delimited_read_3A.dat X: 0.5 to 4.5 Column 1: V1 is V1 (days)(Julian days since 1-Jan-1900) Column 2: V2 is V2 (hours)(Time of day) Column 3: V3 is V3 (days)(Julian days since 1-Jan-1900) Column 4: V4 is V4 (days)(Julian days since 1-Jan-1900) Column 5: V5 is V5 (hours)(Time of day) Column 6: V6 is V6 (hours)(Time of day) V1 V2 V3 V4 V5 V6 1 / 1: 36493. 12.00 36493. 36218. 12.00 13.76 2 / 2: 36494. 1.00 36494. 36219. 1.00 14.76 3 / 3: .... .... 36495. 36220. 2.00 15.75 4 / 4: 36496. 3.00 36496. 36221. 3.00 16.76
Delimiters can be used to break up individual fields. Use both the slash and a comma (indicated by backslash and comma \,) to read the day, month, and year into separate fields
yes? FILE/FORM=delim/DELIM="/,\," delimited_read_3.dat yes? list v1,v2,v3,v4 DATA SET: ./delimited_read_3.dat X: 0.5 to 4.5 Column 1: V1 Column 2: V2 Column 3: V3 Column 4: V4 is V4 (hours)(Time of day) V1 V2 V3 V4 1 / 1: 12.00 1.000 99.00 12.00 2 / 2: 12.00 2.000 99.00 1.00 3 / 3: 12.00 3.000 99.00 2.00 4 / 4: 12.00 4.000 99.00 3.00
Example 4: dates-times as a single field
Beginning with Ferret v7.1, additional data types datime and edatime are introduced. These are combined "date and time" and "Eurodate and time".
Consider this file, delim_datetime.csv with two header lines and some combination date-time fields:
example file with Euro-date/time and US date/time records index,euro-date-time, us-date-time, us-date, eurodate 1, 22/01/2014 01:00:00, 01/22/20 06:40:00, 5/20/95, 20/5/91 2, 22/01/2014 02:20:00, 01/22/21 07:40:00, 5/20/05, 20/6/93 3, 22/01/2014 03:40:00, 01/22/22 08:40:00, 5/20/15, 20/7/95 4, 22/01/2014 04:00:00, 01/22/23 09:40:00, 5/20/25, 20/8/99 5, 22/01/2014 05:20:00, 01/22/24 10:40:00, 5/20/35, 20/10/02
Read the data. Note that 2-digit years in the last two columns are interpreted as being either in the 1900's or the 2000's: years before 50 will be in the 2000's, 50 and higher will be put in the 1900's. 4-digit years are recommended.
yes? columns/skip=3/var="index,edtim,udtim,udate,edate"\ /type="num,edatime,datime,date,eurodate" delim_datetime.csv yes? show data currently SET data sets: 1> ./delim_datetime.csv (default) name title I J K L INDEX index 1:8 ... ... ... EDTIM edtim 1:8 ... ... ... (Julian days since 1-Jan-1900) UDTIM udtim 1:8 ... ... ... (Julian days since 1-Jan-1900) UDATE udate 1:8 ... ... ... (Julian days since 1-Jan-1900) EDATE edate 1:8 ... ... ... (Julian days since 1-Jan-1900) yes? define axis/t/t0=1-JAN-1900/units=days time1 = edtim yes? sh axis time1 name axis # pts start end TIME1 TIME 5 i 22-JAN-2014 01:00 22-JAN-2014 05:20 T0 = 1-JAN-1900 Axis span (to cell edges) = 0.2361111 yes? define axis/t/t0=1-JAN-1900/units=days time2 = udateyes? sh axis time2 name axis # pts start end TIME2 TIME 5 i 20-MAY-1995 00:00 20-MAY-2035 00:00 T0 = 1-JAN-1900 Axis span (to cell edges) = 18262.5
A quick note: On reading ASCII data, one sometime sees the error message:
**TMAP ERR: Host is down Last or next-to-last record read:
"Host is down" is system-generated error, and means there is a read error. Check that the data in the file is numeric, that there aren't header records you have forgotten to skip, and that the file is a Unix-formatted file and not a Windows-formatted file.