2.5 ASCII DATA
To access ASCII data file sets use
yes? SET DATA/EZ ASCII_file_name or equivalently
yes? FILE ASCII_file_name
The following are qualifiers to SET DATA/EZ or FILE:
|
|
|
| Qualifier | Description |
|
/VARIABLES |
names the variables in the file |
|
/TITLE |
associates a title with the data set |
|
/GRID |
indicates multi-dimensional data and units |
|
/COLUMNS |
tells how many data values are in each record |
|
/FORMAT |
specifies the format of the records |
|
/SKIP |
skips initial records of the file |
|
/ORDER |
specifies order of axes (which varies fastest) |
Use command SET VARIABLE to individually customize the variables.
2.5.1 Reading ASCII files
Below are several examples of reading ASCII data properly. (Uniform record length, FORTRAN-structured binary data are read similarly with the addition of the qualifier /FORMAT= "unf". Seethe chapter on "Data Set Basics", section "Binary Data", for other binary types). First, we look briefly at the relationship between Ferret and standard matrix notation.
Linear algebra uses established conventions in matrix notation. In a matrix A(i,j), the first index denotes a (horizontal) row and the second denotes a (vertical) column.
|
A11 |
A12 |
A13 |
... |
A1n |
|
|
A21 |
A22 |
A23 |
... |
A2n |
Matrix A(i,j) |
|
... |
|||||
|
Am1 |
Am2 |
Am3 |
... |
Amn |
X-Y graphs follow established conventions as well, which are that X is the horizontal axis (and in a geographical context, the longitude axis) and increases to the right, and Y is the vertical axis (latitude) and increases upward (Ferret provides the /DEPTH qualifier to explicitly designate axes where the vertical axis convention is reversed).
In Ferret, the first index of a matrix, i, is associated with the first index of an (x,y) pair, x. Likewise, j corresponds to y. Element Am2, for example, corresponds graphically to x=m and y=2.
By default, Ferret stores data in the same manner as FORTRAN—the first index varies fastest. Use the qualifier /ORDER to alter this behavior. The following examples demonstrate how Ferret handles matrices.
Example 1—1 variable, 1 dimension
1a) Consider a data set containing the height of a plant at regular time intervals, listed in a single column:
2.3
3.1
4.5
5.6
. . .
To access, name, and plot this variable properly, use the commands
yes? FILE/VAR=height plant.dat
yes? PLOT height
1b) Now consider the same data, except listed in four columns:
2.3 3.1 4.5 5.6
5.7 5.9 6.1 7.2
. . .
Because there are more values per record (4) than variables (1), use:
yes? FILE/VAR=height/COLUMNS=4 plant4.dat
yes? PLOT height
Example 2—1 variable, 1 dimension, with a large number of data points.
The simple FILE command:
yes? FILE/VAR=height plant.dat
uses an abstract axis of fixed length, 20480 points. If your data is larger than that, you can read the data by defining an axis of appropriate length. Set the length to a number equal to or larger than the dimension of your data. The plot command will plot the actual number of points in the file.
yes? DEFINE AXIS/X/X=1:50000:1 longax
yes? DEFINE GRID/X=longax biggrid
yes? FILE/VAR=height/GRID=biggrid plant.dat
yes? PLOT height
Example 3—2 variables, 1 dimension
3a) Consider a data set containing the height of a plant and the amount of water given to the plant, measured at regular time intervals:
2.3 20.4
3.1 31.2
4.5 15.7
5.6 17.3
. . .
To read and plot this data use
yes? FILE/VAR="height,water" plant_wat.dat
yes? PLOT height,water
3b) The number of columns need be specified only if the number of columns exceeds the number of variables. If the data are in six columns
2.3 20.4 3.1 31.2 4.5 15.7
5.6 17.3 ...
use
yes? FILE/VAR="height,water"/COLUMNS=6 plant_wat6.dat
yes? PLOT height,water
Example 4—1 variable, 2 dimensions
4a) Consider a different situation: a greenhouse with three rows of four plants and a file with a single column of data representing the height of each plant at a single time (successive values represent plants in a row of the greenhouse):
3.1
2.6
5.4
4.6
3.5
6.1
. . .
If we want to produce a contour plot of height as a function of position in the greenhouse, axes will have to be defined:
yes? DEFINE AXIS/X=1:4:1 xplants
yes? DEFINE AXIS/Y=1:3:1 yplants
yes? DEFINE GRID/X=xplants/Y=yplants gplants
yes? FILE/VAR=height/GRID=gplants greenhouse_plants.dat
yes? CONTOUR height
When reading data the first index, x, varies fastest. Schematically, the data will be assigned as follows:
x=1 x=2 x=3 x=4
y=1 3.1 2.6 5.4 4.6
y=2 3.5 6.1 . . .
y=3 . . .
4b) If the file in the above example has, instead, 4 values per record:
3.1 2.6 5.4 4.6
3.5 6.1 . . .
then add /COLUMNS=4 to the FILE command:
yes? FILE/VAR=height/COLUMNS=4/GRID=gplants greenhouse_plants.dat
Example 5—2 variables, 2 dimensions
Like Example 3, consider a greenhouse with three rows of four plants each and a data set with the height of each plant and the length of its longest leaf:
3.1 0.54
2.6 0.37
5.4 0.66
4.6 0.71
3.5 0.14
6.1 0.95
. .
. .
Again, axes and a grid must be defined:
yes? DEFINE AXIS/X=1:4:1 xht_leaf
yes? DEFINE AXIS/Y=1:3:1 Yht_leaf
yes? DEFINE GRID/X=xht_leaf/Y=yht_leaf ght_leaf
yes? FILE/VAR="height,leaf"/GRID=ght_leaf greenhouse_ht_lf.dat
yes? SHADE height
yes? CONTOUR/OVER leaf
The above commands create a color-shaded plot of height in the greenhouse, and overlay a contour plot of leaf length. Schematically, the data will be assigned as follows:
x=1 x=2 x=3 x=4
ht , lf ht , lf
y=1 3.1, 0.54 2.6, 0.37 5.4, 0.66 4.6, 0.71
y=2 3.5, 0.14 6.1, 0.95 . . .
y=3 . . .
Example 6—2 variables, 3 dimensions (time series)
Consider the same greenhouse with height and leaf length data taken at twelve different times. The following commands will create a three-dimensional grid and a plot of the height and leaf length versus time for a specific plant.
yes? DEFINE AXIS/X=1:4:1 xplnt_tm
yes? DEFINE AXIS/Y=1:3:1 yplnt_tm
yes? DEFINE AXIS/T=1:12:1 tplnt_tm
yes? DEFINE GRID/X=xplnt_tm/Y=yplnt_tm/T=tplnt_tm gplant2
yes? FILE/VAR="height,leaf"/GRID=gplant2 green_time.dat
yes? PLOT/X=3/Y=2 height, leaf
Example 7—1 variable, 3 dimensions, permuted order (vertical profile)
Consider a collection of oceanographic measurements made to a depth of 1000 meters. Suppose that the data file contains only a single variable, salt. Each record contains a vertical profile (11 values) of a particular x,y (long,lat) position. Supposing that successive records are successive longitudes, the data file would look as follows (assume the equivalencies are not in the file):
z=0 z=10 z=20 . . .
x=30W,y=5S 35.89 35.90 35.93 35.97 36.02 36.05 35.96 35.40 35.13 34.89 34.72
x=29W,y=5S 35.89 35.91 35.94 35.97 36.01 36.04 35.94 35.39 35.13 34.90 34.72
. . .
Use the qualifier /DEPTH= when defining the Z axis to indicate positive downward, and /ORDER when setting the data set to properly read in the permuted data:
yes? DEFINE AXIS/X=30W:25W:1/UNIT=degrees salx
yes? DEFINE AXIS/Y=5S:5N:1/UNIT=degrees saly
yes? DEFINE AXIS/Z=0:1000:100/UNIT=meters/DEPTH salz
yes? DEFINE GRID/X=salx/Y=saly/Z=salz salgrid
yes? FILE/ORDER=zxy/GRID=salgrid/VAR=sal/COL=11 sal.dat
2.5.2 Reading "DELIMITED" data files
SET DATA/FORMAT=DELIMITED[/DELIMITERS=][/TYPE=][/VAR=] filename
For "delimited" files, such as output of spreadsheets, SET DATA/FORMAT=DELIMITED initializes files of mixed numerical, string, and date fields. If the data types are not specified the file is analyzed automatically to determine data types.
The alias COLUMNS stands for "SET DATA/FORMAT=DELIMITED".
Example 1: Strings, latitudes, longitudes, and numeric data.
This file is delimited by commas. Some entries are null; they are indicated by two commas with no space between. File delimited_read_1.dat contains:
col1, col2 col3 col4 col5 col6 col7
one ,, 1.1, 24S, 130E ,, 1e1
two ,, 2.2, 24N, 130W, 2S
three ,, 3.3, 24, 130, 3N, 3e-2
five ,, 4.4, -24, -130, 91, -4e2
extra line
If there is no /TYPE qualifier, the data type is automatically determined. If all entries in the column match a data type they are assigned that type. First let's try the file as is, using automatic analysis. Record 1 contains 5 column headings (text) so V1 through V5 are analyzed as text variables.
yes? FILE/FORMAT=delim delimited_read_1.dat
yes? LIST v1,v2,v3,v4,v5,v6,v7,v8,v9,v10
DATA SET: ./delimited_read_1.dat
X: 0.5 to 7.5
Column 1: V1
Column 2: V2
Column 3: V3
Column 4: V4
Column 5: V5
Column 6: V6
Column 7: V7
Column 8: V8
Column 9: V9
Column 10: V10
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 / 1: "col1" "col2" "col3" "col4" "col5" " " .... " " " " ....
2 / 2: "one" " " "1.1" "24S" "130E" " " 10.0 "word 1" " " ....
3 / 3: "two" " " "2.2" "24N" "130W" "2S" .... "word 2" " " ....
4 / 4: "three" " " "3.3" "24" "130" "3N" 0.0 " wd 3 " " " ....
5 / 5: " " " " " " " " " " " " .... " " " " ....
6 / 6: "five" " " "4.4" "-24" "-130" "91" -400.0 "word 4" "aa" 77.00
7 / 7: "extra line" " " " " " " " " " " .... " " " " ....
Now skip the first record to do a better "analysis" of the file fields. Explicitly name the variables. Note that v3 is correctly analyzed as numeric, A4 is latitude and A5 longitude. A6 is analyzed as string data, because the value 91 in record 5 does not fall in the range for latitudes, and records 2 and 3 contain mixed numbers and letters.
yes? FILE/FORMAT=DELIM/SKIP=1/VAR="a1,a2,a3,a4,a5,a6,a7,a8,a9" delimited_read_ 1.dat
yes? LIST a1,a2,a3,a4,a5,a6,a7
DATA SET: ./delimited_read_1.dat
X: 0.5 to 6.5
Column 1: A1
Column 2: A2 is A2 (all values missing)
Column 3: A3
Column 4: A4 is A4 (degrees_north)(Latitude)
Column 5: A5 is A5 (degrees_east)(Longitude)
Column 6: A6
Column 7: A7
A1 A2 A3 A4 A5 A6 A7
1 / 1: "one" ... 1.100 -24.00 130.0 " " 10.0
2 / 2: "two" ... 2.200 24.00 -130.0 "2S" ....
3 / 3: "three" ... 3.300 24.00 130.0 "3N" 0.0
4 / 4: " " ... .... .... .... " " ....
5 / 5: "five" ... 4.400 -24.00 -130.0 "91" -400.0
6 / 6: "extra line"... .... .... .... " " ....
Now use the /TYPE qualifier to specify that all columns be treated as numeric.
yes? FILE/FORMAT=delim/SKIP=1/TYPE=numeric delimited_read_1.dat
yes? LIST v1,v2,v3,v4,v5,v6,v7,v8
DATA SET: ./delimited_read_1.dat
X: 0.5 to 6.5
Column 1: V1
Column 2: V2
Column 3: V3
Column 4: V4
Column 5: V5
Column 6: V6
Column 7: V7
Column 8: V8
V1 V2 V3 V4 V5 V6 V7 V8
1 / 1:...... 1.100 .... .... .... 10.0...
2 / 2:...... 2.200 .... .... .... .......
3 / 3:...... 3.300 24.00 130.0 .... 0.0...
4 / 4:...... .... .... .... .... .......
5 / 5:...... 4.400 -24.00 -130.0 91.00 -400.0...
6 / 6:...... .... .... .... .... .......
Here is how to read only the first line of the file. If the variables are not specified, 7 variables are generated because auto-analysis of file doesn't stop at the first record. Use the command COLUMNS, the alias for FILE/FORMAT=delimited
yes? DEFINE AXIS/X=1:1:1 x1yes? DEFINE GRID/X=x1 g1
yes? COLUMNS/GRID=g1 delimited_read_1.dat
yes? list v1,v2,v3,v4,v5,v6,v7
DATA SET: ./bn_delimited_read_1.dat
X: 1
Column 1: V1
Column 2: V2
Column 3: V3
Column 4: V4
Column 5: V5
Column 6: V6
Column 7: V7
V1 V2 V3 V4 V5 V6 V7
I / *: "col1" "col2" "col3" "col4" "col5" " "...
Define the variables to read.
yes? COLUMNS/GRID=g1/VAR="c1,c2,c3,c4,c5" delimited_read_1.dat
yes? LIST c1,c2,c3,c4,c5
DATA SET: ./delimited_read_1.dat
X: 1
Column 1: C1
Column 2: C2
Column 3: C3
Column 4: C4
Column 5: C5
C1 C2 C3 C4 C5
I / *: "col1" "col2" "col3" "col4" "col5"
Example 2: File using blank as a delimiter.
Ferret recognizes the file as containing date and time variables, further explored in Example 3 below. Here is the file delimited_read_2.dat. There is a record of many blanks in record 2.
1981/12/03 12:35:00
1895/2/6 13:45:05
Read the file using /DELIMITER=" "
yes? FILE/FORM=delimited/DELIMITER=" " delimited_read_2.dat
yes? LIST v1,v2
DATA SET: ./delimited_read_2.dat
X: 0.5 to 3.5
Column 1: V1 is V1 (days)(Julian days since 1-Jan-1900)
Column 2: V2 is V2 (hours)(Time of day)
V1 V2
1 / 1: 37965. 12.58
2 / 2: .... ....
3 / 3: 39051. 13.75
Example 3: dates and times
Note that record 3 has syntax errors in the first 4 fields. Here is delimited_read_3.dat:
12/1/99, 12:00, 12/1/99, 1999-03-01, 12:00, 13:45:36.5
12/2/99, 01:00:13.5, 12/2/99, 1999-03-02, 01:00:13.5, 14:45:36.5
12/3/99x, 2:00x, 12/3/99, 1999-03-03, 2:00, 15:45
12/4/99, 03:00, 12/4/99, 1999-03-04, 03:00, 16:45:36.5
Read with auto-analysis. The records with syntax errors cause variables 1 and 2 to be read as string variables.
yes? COLUMNS delimited_read_3.dat
yes? LIST v1,v2,v3,v4,v5,v6
DATA SET: ./delimited_read_3.dat
X: 0.5 to 4.5
Column 1: V1
Column 2: V2
Column 3: V3 is V3 (days)(Julian days since 1-Jan-1900)
Column 4: V4 is V4 (days)(Julian days since 1-Jan-1900)
Column 5: V5 is V5 (hours)(Time of day)
Column 6: V6 is V6 (hours)(Time of day)
V1 V2 V3 V4 V5 V6
1 / 1: "12/1/99" "12:00" 36493. 36218. 12.00 13.76
2 / 2: "12/2/99" "01:00:13.5" 36494. 36219. 1.00 14.76
3 / 3: "12/3/99x" "2:00x" 36495. 36220. 2.00 15.75
4 / 4: "12/4/99" "03:00" 36496. 36221. 3.00 16.76
Use the date variables in v3 and v4 to define time axes. The date encodings are as expected.
yes? DEFINE AXIS/T/UNITS=days/T0=1-jan-1900 tax = v3
yes? SHOW AXIS tax
name axis # pts start end
TAX TIME 4 r 01-DEC-1999 00:00 04-DEC-1999 00:00
T0 = 1-JAN-1900
Axis span (to cell edges) = 4
yes? DEFINE AXIS/T/UNITS=days/T0=1-jan-1900 tax = v4Replacing definition of axis TAX
yes? SHOW AXIS tax
name axis # pts start end
TAX TIME 4 r 01-MAR-1999 00:00 04-MAR-1999 00:00
T0 = 1-JAN-1900
Axis span (to cell edges) = 4
Next we'll specify each column's type. Only the first two characters of the type are needed. Now we can read those columns which had errors, except for the record with the errors.
yes? COLUMNS/TYPE="da,ti,date, date, time, time" delimited_read_3.dat
yes? LIST v1,v2,v3,v4,v5,v6
DATA SET: ./delimited_read_3.dat
X: 0.5 to 4.5
Column 1: V1 is V1 (days)(Julian days since 1-Jan-1900)
Column 2: V2 is V2 (hours)(Time of day)
Column 3: V3 is V3 (days)(Julian days since 1-Jan-1900)
Column 4: V4 is V4 (days)(Julian days since 1-Jan-1900)
Column 5: V5 is V5 (hours)(Time of day)
Column 6: V6 is V6 (hours)(Time of day)
V1 V2 V3 V4 V5 V6
1 / 1: 36493. 12.00 36493. 36218. 12.00 13.76
2 / 2: 36494. 1.00 36494. 36219. 1.00 14.76
3 / 3: .... .... 36495. 36220. 2.00 15.75
4 / 4: 36496. 3.00 36496. 36221. 3.00 16.76
Delimiters can be used to break up individual fields. Use both the slash and a comma (indicated by backslash and comma \,)
yes? FILE/FORM=delim/DELIM="/,\," delimited_read_3.dat
yes? LIST V1,V2,V3,V4,v5,v6
DATA SET: ./delimited_read_3.dat
X: 0.5 to 4.5
Column 1: V1
Column 2: V2
Column 3: V3
Column 4: V4
Column 5: V5
Column 6: V6
V1 V2 V3 V4 V5 V6
1 / 1: 12.00 1.000 "99" "12:00" 12.00 1.000
2 / 2: 12.00 2.000 "99" "01:00:13.5" 12.00 2.000
3 / 3: 12.00 3.000 "99x" "2:00x" 12.00 3.000
4 / 4: 12.00 4.000 "99" "03:00" 12.00 4.000