2.5 ASCII DATA

Last modified: Wed, 05/06/2020 - 09:18

To access ASCII data file sets use

yes? SET DATA/EZ ASCII_file_name 
yes? ! or equivalently
yes? FILE ASCII_file_name

The following are qualifiers to SET DATA/EZ or FILE:


Qualifier	Description
/VARIABLES	names the variables in the file
/TITLE	associates a title with the data set
/GRID	indicates multi-dimensional data and units
/COLUMNS	tells how many data values are in each record
/FORMAT	specifies the format of the records
/SKIP	skips initial records of the file
/ORDER	specifies order of axes (which varies fastest)

Use command SET VARIABLE to individually customize the variables.

NOTE that with the FILE or SET DATA/EZ command, there is an upper limig on the number of variables that can be read in free format from a single ASCII file. The SET DATA/FORM=DELIMITED command allows for 100 variables per file.

When reading from ascii, the entire grid and all variables will be read when the data is requested. To read subsets of the data, define a smaller grid to read a subset of records and perhaps write that to a netCDF file, then do a second read, skipping those first records, and so on.

2.5.1 Reading ASCII files

Below are several examples of reading ASCII data properly. (Uniform record length, FORTRAN-structured binary data are read similarly with the addition of the qualifier /FORMAT= "unf". See the chapter on "Data Set Basics", section "Binary Data", for other binary types). First, we look briefly at the relationship between Ferret and standard matrix notation.

Linear algebra uses established conventions in matrix notation. In a matrix A(i,j), the first index denotes a (horizontal) row and the second denotes a (vertical) column.


A11	A12	A13	...	A1n
A21	A22	A23	...	A2n	Matrix A(i,j)
...
Am1	Am2	Am3	...	Amn

X-Y graphs follow established conventions as well, which are that X is the horizontal axis (and in a geographical context, the longitude axis) and increases to the right, and Y is the vertical axis (latitude) and increases upward (Ferret provides the /DEPTH qualifier to explicitly designate axes where the vertical axis convention is reversed).

In Ferret, the first index of a matrix, i, is associated with the first index of an (x,y) pair, x. Likewise, j corresponds to y. Element Am2, for example, corresponds graphically to x=m and y=2.

By default, Ferret stores data in the same manner as FORTRAN‚Äîthe first index varies fastest. Use the qualifier /ORDER to alter this behavior. The following examples demonstrate how Ferret handles matrices.

Example 1, 1 variable, 1 dimension

1a) Consider a data set containing the height of a plant at regular time intervals, listed in a single column:

2.3
3.1
4.5
5.6
. . .

To access, name, and plot this variable properly, use the commands

yes? FILE/VAR=height plant.dat
yes? PLOT height

However the data should be on a time axis. The above gives us a simple first look at the data, but we want to plot it as a time series. First define the time axis and a grid consisting of that axis, and define the data on that grid. See examples below for multi-dimensional data.

yes? DEFINE AXIS/T=1-JUN-2014:15-JUL-2014:1/UNITS=days/T0=1-JUN-2014  tdays
yes? DEFINE GRID/T=tdays tgrid
yes? FILE/VAR=height/GRID=tgrid plant.dat
yes? PLOT height

1b) Now consider the same data, except listed in four columns:

2.3 3.1 4.5 5.6
5.7 5.9 6.1 7.2
. . .

Because there are more values per record (4) than variables (1), use:

yes? FILE/VAR=height/COLUMNS=4 plant4.dat
yes? PLOT height

Example 2‚ 1 variable, 1 dimension, with a large number of data points.

The simple FILE command lets Ferret set up a grid.

yes? FILE/VAR=height plant.dat

Beginning with Ferret v6.93, if the FILE command is given without a grid, Ferret determines the size of the file and defines an axis and grid on which to read the variables. It takes any /skip and /columns qualifiers into account. This means that `VAR,RETURN=isize` for variables in the dataset return the correct result.

[Prior to Ferret v6.93, when used on its own, SET DATA/EZ/VAR= uses an abstract axis of fixed length, 20480 points. If your data is larger than that, you can read the data by defining an axis of appropriate length. Set the length to a number equal to or larger than the dimension of your data. The plot command will plot the actual number of points in the file.]

yes? DEFINE AXIS/X/X=1:50000:1 longax
yes? DEFINE GRID/X=longax biggrid
yes? FILE/VAR=height/GRID=biggrid plant.dat
yes? PLOT height

Example 3, 2 variables, 1 dimension

3a) Consider a data set containing the height of a plant and the amount of water given to the plant, measured at regular time intervals:

2.3 20.4
3.1 31.2
4.5 15.7
5.6 17.3
. . .

To read and plot this data use

yes? FILE/VAR="height,water" plant_wat.dat
yes? PLOT height,water

3b) The number of columns need be specified only if the number of columns exceeds the number of variables. If the data are in six columns

2.3 20.4 3.1 31.2 4.5 15.7 
5.6 17.3 ...

use

yes? FILE/VAR="height,water"/COLUMNS=6 plant_wat6.dat
yes? PLOT height,water

Example 4‚ 1 variable, 2 dimensions

In addition to the example below, see the FAQ, Reading ASCII data representing gridded data, for ascii files which have coordinate data listed in the file.

4a) Consider a different situation: a greenhouse with three rows of four plants and a file with a single column of data representing the height of each plant at a single time (successive values represent plants in a row of the greenhouse):

3.1
2.6
5.4
4.6
3.5
6.1
. . .

If we want to produce a contour plot of height as a function of position in the greenhouse, axes will have to be defined:

yes? DEFINE AXIS/X=1:4:1 xplants
yes? DEFINE AXIS/Y=1:3:1 yplants
yes? DEFINE GRID/X=xplants/Y=yplants gplants
yes? FILE/VAR=height/GRID=gplants greenhouse_plants.dat
yes? CONTOUR height

When reading data the first index, x, varies fastest. Schematically, the data will be assigned as follows:

x=1 x=2 x=3 x=4 
y=1 3.1 2.6 5.4 4.6 
y=2 3.5 6.1 . . .
y=3 . . .

4b) If the file in the above example has, instead, 4 values per record:

3.1 2.6 5.4 4.6
3.5 6.1 . . .

then add /COLUMNS=4 to the FILE command:

yes? FILE/VAR=height/COLUMNS=4/GRID=gplants greenhouse_plants.dat

Example 5‚ 2 variables, 2 dimensions

Like Example 3, consider a greenhouse with three rows of four plants each and a data set with the height of each plant and the length of its longest leaf:

3.1 0.54
2.6 0.37
5.4 0.66
4.6 0.71
3.5 0.14
6.1 0.95
. .
. .

Again, axes and a grid must be defined:

yes? DEFINE AXIS/X=1:4:1 xht_leaf
yes? DEFINE AXIS/Y=1:3:1 Yht_leaf
yes? DEFINE GRID/X=xht_leaf/Y=yht_leaf ght_leaf
yes? FILE/VAR="height,leaf"/GRID=ght_leaf greenhouse_ht_lf.dat
yes? SHADE height
yes? CONTOUR/OVER leaf

The above commands create a color-shaded plot of height in the greenhouse, and overlay a contour plot of leaf length. Schematically, the data will be assigned as follows:

x=1 x=2 x=3 x=4 
ht , lf ht , lf
y=1 3.1, 0.54 2.6, 0.37 5.4, 0.66 4.6, 0.71
y=2 3.5, 0.14 6.1, 0.95 . . . 
y=3 . . .

Example 6‚ 2 variables, 3 dimensions (time series)

Consider the same greenhouse with height and leaf length data taken at twelve different times. The following commands will create a three-dimensional grid and a plot of the height and leaf length versus time for a specific plant.

yes? DEFINE AXIS/X=1:4:1 xplnt_tm
yes? DEFINE AXIS/Y=1:3:1 yplnt_tm
yes? DEFINE AXIS/T=1:12:1 tplnt_tm
yes? DEFINE GRID/X=xplnt_tm/Y=yplnt_tm/T=tplnt_tm gplant2
yes? FILE/VAR="height,leaf"/GRID=gplant2 green_time.dat
yes? PLOT/X=3/Y=2 height, leaf

Example 7‚ 1 variable, 3 dimensions, permuted order (vertical profile)

Consider a collection of oceanographic measurements made to a depth of 1000 meters. Suppose that the data file contains only a single variable, salt. Each record contains a vertical profile (11 values) of a particular x,y (long,lat) position. Supposing that successive records are successive longitudes, the data file would look as follows (assume the equivalencies are not in the file):

z=0 z=10 z=20 . . . 
x=30W,y=5S 35.89 35.90 35.93 35.97 36.02 36.05 35.96 35.40 35.13 34.89 34.72 
x=29W,y=5S 35.89 35.91 35.94 35.97 36.01 36.04 35.94 35.39 35.13 34.90 34.72 
. . .

Use the qualifier /DEPTH= when defining the Z axis to indicate positive downward, and /ORDER when setting the data set to properly read in the permuted data:

yes? DEFINE AXIS/X=30W:25W:1/UNIT=degrees salx
yes? DEFINE AXIS/Y=5S:5N:1/UNIT=degrees saly
yes? DEFINE AXIS/Z=0:1000:100/UNIT=meters/DEPTH salz
yes? DEFINE GRID/X=salx/Y=saly/Z=salz salgrid
yes? FILE/ORDER=zxy/GRID=salgrid/VAR=sal/COL=11 sal.dat

See also:
EXPNDI_BY_T function

EXPNDI_BY_Z function

These functions take 1-D lists of data and put them onto an X-Z or X-T grid, so that a 1-D list which represents a set of profiles or a set of time series may be defined with the correct Z or T axes.

2.5.2 Reading "DELIMITED" data files

SET DATA/EZ/FORMAT=DELIMITED[/DELIMITERS=][/TYPE=][/VAR=] filename

For "delimited" files, such as comma-delimited or tab-delimited output of spreadsheets, SET DATA/EZ/FORMAT=DELIMITED initializes files of mixed numerical, string, and date fields. If the data types are not specified the file is analyzed automatically to determine data types.

The alias COLUMNS stands for "SET DATA/FORMAT=DELIMITED".

The number of variables that can be read from a single file is 100 (Using the simple ascii reading commands we are limited to 20 variables).

Example 1: Strings, latitudes, longitudes, and numeric data.

This file is delimited by commas. Some entries are null; they are indicated by two commas with no space between. File delimited_read_1.dat contains:

col1, col2 col3 col4 col5 col6 col7
one ,, 1.1, 24S, 130E ,, 1e1
two ,, 2.2, 24N, 130W, 2S
three ,, 3.3, 24, 130, 3N, 3e-2 

five ,, 4.4, -24, -130, 91, -4e2
extra line

If there is no /TYPE qualifier, the data type is automatically determined. If all entries in the column match a data type they are assigned that type. First let's try the file as is, using automatic analysis. Record 1 contains 5 column headings (text) so V1 through V5 are analyzed as text variables.

yes? FILE/FORMAT=delim delimited_read_1.dat 
yes? LIST v1,v2,v3,v4,v5,v6,v7,v8,v9,v10
 DATA SET: ./delimited_read_1.dat
 X: 0.5 to 7.5
 Column 1: V1
 Column 2: V2
 Column 3: V3
 Column 4: V4
 Column 5: V5
 Column 6: V6
 Column 7: V7
 Column 8: V8
 Column 9: V9
 Column 10: V10
 V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 / 1: "col1" "col2" "col3" "col4" "col5" " " .... " " " " ....
2 / 2: "one" " " "1.1" "24S" "130E" " " 10.0 "word 1" " " ....
3 / 3: "two" " " "2.2" "24N" "130W" "2S" .... "word 2" " " ....
4 / 4: "three" " " "3.3" "24" "130" "3N" 0.0 " wd 3 " " " ....
5 / 5: " " " " " " " " " " " " .... " " " " ....
6 / 6: "five" " " "4.4" "-24" "-130" "91" -400.0 "word 4" "aa" 77.00
7 / 7: "extra line" " " " " " " " " " " .... " " " " ....

Now skip the first record to do a better "analysis" of the file fields. Explicitly name the variables. Note that v3 is correctly analyzed as numeric, A4 is latitude and A5 longitude. A6 is analyzed as string data, because the value 91 in record 5 does not fall in the range for latitudes, and records 2 and 3 contain mixed numbers and letters.

yes? FILE/FORMAT=DELIM/SKIP=1/VAR="a1,a2,a3,a4,a5,a6,a7,a8,a9" delimited_read_ 1.dat
yes? LIST a1,a2,a3,a4,a5,a6,a7
 DATA SET: ./delimited_read_1.dat
 X: 0.5 to 6.5
 Column 1: A1
 Column 2: A2 is A2 (all values missing)
 Column 3: A3
 Column 4: A4 is A4 (degrees_north)(Latitude)
 Column 5: A5 is A5 (degrees_east)(Longitude)
 Column 6: A6
 Column 7: A7
 A1 A2 A3 A4 A5 A6 A7
1 / 1: "one" ... 1.100 -24.00 130.0 " " 10.0
2 / 2: "two" ... 2.200 24.00 -130.0 "2S" ....
3 / 3: "three" ... 3.300 24.00 130.0 "3N" 0.0
4 / 4: " " ... .... .... .... " " ....
5 / 5: "five" ... 4.400 -24.00 -130.0 "91" -400.0
6 / 6: "extra line"... .... .... .... " " ....

Now use the /TYPE qualifier to specify that all columns be treated as numeric.

yes? FILE/FORMAT=delim/SKIP=1/TYPE=numeric delimited_read_1.dat
yes? LIST v1,v2,v3,v4,v5,v6,v7,v8
 DATA SET: ./delimited_read_1.dat
 X: 0.5 to 6.5
 Column 1: V1
 Column 2: V2
 Column 3: V3
 Column 4: V4
 Column 5: V5
 Column 6: V6
 Column 7: V7
 Column 8: V8
 V1 V2 V3 V4 V5 V6 V7 V8
1 / 1:...... 1.100 .... .... .... 10.0...
2 / 2:...... 2.200 .... .... .... .......
3 / 3:...... 3.300 24.00 130.0 .... 0.0...
4 / 4:...... .... .... .... .... .......
5 / 5:...... 4.400 -24.00 -130.0 91.00 -400.0...
6 / 6:...... .... .... .... .... .......

Here is how to read only the first line of the file. If the variables are not specified, 7 variables are generated because auto-analysis of file doesn't stop at the first record. Use the command COLUMNS, the alias for FILE/FORMAT=delimited

yes? DEFINE AXIS/X=1:1:1 x1yes? DEFINE GRID/X=x1 g1
yes? COLUMNS/GRID=g1 delimited_read_1.dat
yes? list v1,v2,v3,v4,v5,v6,v7
 DATA SET: ./bn_delimited_read_1.dat
 X: 1
 Column 1: V1
 Column 2: V2
 Column 3: V3
 Column 4: V4
 Column 5: V5
 Column 6: V6
 Column 7: V7
 V1 V2 V3 V4 V5 V6 V7
I / *: "col1" "col2" "col3" "col4" "col5" " "...

Define the variables to read.

yes? COLUMNS/GRID=g1/VAR="c1,c2,c3,c4,c5" delimited_read_1.dat
yes? LIST c1,c2,c3,c4,c5
 DATA SET: ./delimited_read_1.dat
 X: 1
 Column 1: C1
 Column 2: C2
 Column 3: C3
 Column 4: C4
 Column 5: C5
 C1 C2 C3 C4 C5
I / *: "col1" "col2" "col3" "col4" "col5"

Example 2: File using blank as a delimiter.

Ferret recognizes the file as containing date and time variables, further explored in Example 3 below. Here is the file delimited_read_2.dat. There is a record of many blanks in record 2.

1981/12/03 12:35:00

1895/2/6 13:45:05

Read the file using /DELIMITER=" "

yes? FILE/FORM=delimited/DELIMITER=" " delimited_read_2.dat
yes? LIST v1,v2
 DATA SET: ./delimited_read_2.dat
 X: 0.5 to 3.5
 Column 1: V1 is V1 (days)(Julian days since 1-Jan-1900)
 Column 2: V2 is V2 (hours)(Time of day)
 V1 V2
1 / 1: 37965. 12.58
2 / 2: .... ....
3 / 3: 39051. 13.75

More on delimiters: Any ASCII character can be used as a delimiter. If letters are used they are case-sensitive.

So if the file d2.dat contains

1234.44H59.23# -8.9
1111.23H59.23# 2.3
123.79H59.23#56.1


yes? FILE/FORM=delim/DELIMITERS="H,#"/VAR="v1,v2,v3" d2.dat
yes? list/PRECISION=6 v1,v2,v3
             DATA SET: ./d2.dat
             X: 0.5 to 3.5
 Column  1: V1
 Column  2: V2
 Column  3: V3
              V1     V2       V3
1   / 1:  1234.44  59.2300  -8.9000
2   / 2:  1111.23  59.2300   2.3000
3   / 3:   123.79  59.2300  56.1000

Example 3: dates and times

see the next example for reading date-and-time as a single field

Here is delimited_read_3.dat:

12/1/99, 12:00, 12/1/99, 1999-03-01, 12:00, 13:45:36.5
12/2/99, 01:00:13.5, 12/2/99, 1999-03-02, 01:00:13.5, 14:45:36.5
12/3/99, 02:00, 12/3/99, 1999-03-03, 2:00, 15:45
12/4/99, 03:00, 12/4/99, 1999-03-04, 03:00, 16:45:36.5

Read it with auto-detection of the field types:

yes? COLUMNS delimited_read_3.dat
yes? show data
 currently SET data sets:
 1> ./delimited_read_3.dat (default)
 name title I J K L
 V1 V1 1:4 ... ... ...
 (Julian days since 1-Jan-1900)
 V2 V2 1:4 ... ... ...
 (Time of day)
 V3 V3 1:4 ... ... ...
 (Julian days since 1-Jan-1900)
 V4 V4 1:4 ... ... ...
 (Julian days since 1-Jan-1900)
 V5 V5 1:4 ... ... ...
 (Time of day)
 V6 V6 1:4 ... ... ...
 (Time of day)

What if there are errors in the first 4 fields? Here is delimited_read_3A.dat with errors in record 3:

12/1/99, 12:00, 12/1/99, 1999-03-01, 12:00, 13:45:36.5
12/2/99, 01:00:13.5, 12/2/99, 1999-03-02, 01:00:13.5, 14:45:36.5
12/3/99x, 2:00x, 12/3/99, 1999-03-03, 2:00, 15:45
12/4/99, 03:00, 12/4/99, 1999-03-04, 03:00, 16:45:36.5

Read with auto-analysis. The records with syntax errors cause variables 1 and 2 to be read as string variables.

yes? COLUMNS delimited_read_3A.dat
yes? LIST v1,v2,v3,v4,v5,v6
 DATA SET: ./delimited_read_3A.dat
 X: 0.5 to 4.5
 Column 1: V1
 Column 2: V2
 Column 3: V3 is V3 (days)(Julian days since 1-Jan-1900)
 Column 4: V4 is V4 (days)(Julian days since 1-Jan-1900)
 Column 5: V5 is V5 (hours)(Time of day)
 Column 6: V6 is V6 (hours)(Time of day)
 V1 V2 V3 V4 V5 V6
1 / 1: "12/1/99" "12:00" 36493. 36218. 12.00 13.76
2 / 2: "12/2/99" "01:00:13.5" 36494. 36219. 1.00 14.76
3 / 3: "12/3/99x" "2:00x" 36495. 36220. 2.00 15.75
4 / 4: "12/4/99" "03:00" 36496. 36221. 3.00 16.76

Use the date variables in v3 and v4 to define time axes.

yes? DEFINE AXIS/T/UNITS=days/T0=1-jan-1900 tax = v3
yes? SHOW AXIS tax
 name axis # pts start end
 TAX TIME 4 r 01-DEC-1999 00:00 04-DEC-1999 00:00
T0 = 1-JAN-1900
 Axis span (to cell edges) = 4


yes? DEFINE AXIS/T/UNITS=days/T0=1-jan-1900 tax = v4Replacing definition of axis TAX
yes? SHOW AXIS tax
 name axis # pts start end
 TAX TIME 4 r 01-MAR-1999 00:00 04-MAR-1999 00:00
T0 = 1-JAN-1900
 Axis span (to cell edges) = 4

Next we'll specify each column's type. Only the first two characters of the type are needed. Now we can read those columns which had errors, except for the record with the errors.

yes? COLUMNS/TYPE="da,ti,date, date, time, time" delimited_read_3A.dat
yes? LIST v1,v2,v3,v4,v5,v6
 DATA SET: ./delimited_read_3A.dat
 X: 0.5 to 4.5
 Column 1: V1 is V1 (days)(Julian days since 1-Jan-1900)
 Column 2: V2 is V2 (hours)(Time of day)
 Column 3: V3 is V3 (days)(Julian days since 1-Jan-1900)
 Column 4: V4 is V4 (days)(Julian days since 1-Jan-1900)
 Column 5: V5 is V5 (hours)(Time of day)
 Column 6: V6 is V6 (hours)(Time of day)
 V1 V2 V3 V4 V5 V6
1 / 1: 36493. 12.00 36493. 36218. 12.00 13.76
2 / 2: 36494. 1.00 36494. 36219. 1.00 14.76
3 / 3: .... .... 36495. 36220. 2.00 15.75
4 / 4: 36496. 3.00 36496. 36221. 3.00 16.76

Delimiters can be used to break up individual fields. Use both the slash and a comma (indicated by backslash and comma \,) to read the day, month, and year into separate fields

yes? FILE/FORM=delim/DELIM="/,\," delimited_read_3.dat
yes? list v1,v2,v3,v4
 DATA SET: ./delimited_read_3.dat
 X: 0.5 to 4.5
 Column 1: V1
 Column 2: V2
 Column 3: V3
 Column 4: V4 is V4 (hours)(Time of day)
 V1 V2 V3 V4
1 / 1: 12.00 1.000 99.00 12.00
2 / 2: 12.00 2.000 99.00 1.00
3 / 3: 12.00 3.000 99.00 2.00
4 / 4: 12.00 4.000 99.00 3.00

Example 4: dates-times as a single field

Beginning with Ferret v7.1, additional data types datime and edatime are introduced. These are combined "date and time" and "Eurodate and time".

Consider this file, delim_datetime.csv with two header lines and some combination date-time fields:

example file with Euro-date/time and US date/time records
index,euro-date-time, us-date-time, us-date, eurodate
1, 22/01/2014 01:00:00, 01/22/20 06:40:00, 5/20/95, 20/5/91
2, 22/01/2014 02:20:00, 01/22/21 07:40:00, 5/20/05, 20/6/93
3, 22/01/2014 03:40:00, 01/22/22 08:40:00, 5/20/15, 20/7/95
4, 22/01/2014 04:00:00, 01/22/23 09:40:00, 5/20/25, 20/8/99
5, 22/01/2014 05:20:00, 01/22/24 10:40:00, 5/20/35, 20/10/02

Read the data. Note that 2-digit years in the last two columns are interpreted as being either in the 1900's or the 2000's: years before 50 will be in the 2000's, 50 and higher will be put in the 1900's. 4-digit years are recommended.

yes? columns/skip=3/var="index,edtim,udtim,udate,edate"\
/type="num,edatime,datime,date,eurodate" delim_datetime.csv
yes? show data
 currently SET data sets:
 1> ./delim_datetime.csv (default)
 name title I J K L
 INDEX index 1:8 ... ... ...
 EDTIM edtim 1:8 ... ... ...
 (Julian days since 1-Jan-1900)
 UDTIM udtim 1:8 ... ... ...
 (Julian days since 1-Jan-1900)
 UDATE udate 1:8 ... ... ...
 (Julian days since 1-Jan-1900)
 EDATE edate 1:8 ... ... ...
 (Julian days since 1-Jan-1900)

yes? define axis/t/t0=1-JAN-1900/units=days time1 = edtim
yes? sh axis time1
 name axis # pts start end
 TIME1 TIME 5 i 22-JAN-2014 01:00 22-JAN-2014 05:20
T0 = 1-JAN-1900
 Axis span (to cell edges) = 0.2361111

yes? define axis/t/t0=1-JAN-1900/units=days time2 = udateyes? sh axis time2
 name axis # pts start end
 TIME2 TIME 5 i 20-MAY-1995 00:00 20-MAY-2035 00:00
T0 = 1-JAN-1900
 Axis span (to cell edges) = 18262.5

A quick note: On reading ASCII data, one sometime sees the error message:

 
**TMAP ERR: Host is down
             Last or next-to-last record read:

"Host is down" is system-generated error, and means there is a read error. Check that the data in the file is numeric, that there aren't header records you have forgotten to skip, and that the file is a Unix-formatted file and not a Windows-formatted file.

Search form

You are here