2.4 BINARY DATA

Last modified: Tue, 09/29/2020 - 12:47

Ferret can read binary data files that are formatted with and without FORTRAN record length headers (binary files without FORTRAN record length formatting are also known as "stream" files).

NOTE: These file types are not actively supported. The capabilities are historically in Ferret/PyFerret but more modern file types are encouraged for their machine independence and self documentation.

2.4.1 FORTRAN-structured binary files

Files containing record length information are created by FORTRAN programs using the ACCESS="SEQUENTIAL" (the FORTRAN default) mode of file creation and also by Ferret using LIST/FORMAT=unf. Files that contain FORTRAN record length headers must have all data aligned on a 4-byte boundary. Suppose "rrrr" represents 4 bytes of record length information and "dddd" represents a 4-byte data value. Then FORTRAN-structured files are organized in one of the following two ways:

2.4.1.1 Records of uniform length

A FORTRAN-structured file with records of uniform length (3 single-precision floating point data values per record in this figure) looks like this:

rrrr dddd dddd dddd rrrr ...

FORTRAN code that creates a data file of this type might look something like this (sequential access is the default and need not be specified in the OPEN statement):

REAL VARI(10), VAR2(10), VAR3(10)
...
OPEN(UNIT=20,FORMAT='UNFORMATTED',ACCESS='SEQUENTIAL',FILE='MYFILE.DAT')
...
DO 10 I=1,10
WRITE (20) VAR1(I), VAR2(I), VAR3(I)
10 CONTINUE
....

To access data from this file, use

yes? SET DATA/EZ/FORMAT=UNF/VAR=var1,var2,var3/COL=3 myfile.dat or,
yes? FILE/FORMAT=UNF/VAR=var1,var2,var3/COLUMNS=3 myfile.dat

This is very similar to accessing ASCII data with the addition of the /FORMAT=unf qualifier. The /COLUMNS= qualifier tells Ferret the number of data values per record. Although optional in the above example, this qualifier is required if the number of data values per record is greater than the number of variables being read (examples follow in section "ASCII Data").

2.4.1.2 Records of non-uniform length

A FORTRAN-structured file with variable-length records might look like this:

rrrr dddd dddd rrrr
rrrr dddd rrrr
rrrr dddd dddd dddd dddd rrrr
etc.

With care, it is possible to read a data file containing variable-length records which was created using the simplest unformatted FORTRAN OPEN statement and a single WRITE statement for each variable. Use /FORMAT=stream to read such files. Note that sequential access is the FORTRAN default and does not need to be specified in the OPEN statement:

REAL VAR1(1000), VAR2(500)
...
OPEN (UNIT=20, FORMAT="UNFORMATTED", FILE="MYFILE.DAT")
...
WRITE (20) VAR1
WRITE (20) VAR2
....

Use the qualifier /SKIP to skip past the record length information (/SKIP arguments are in units of 8-byte words), and define a grid which will not read past the data values. The /COLUMNS= qualifier can be used when reading multiple variables to specify the number of words separating the start of each variable:

yes? DEFINE AXIS/X=1:500:1 xaxis
yes? DEFINE GRID/X=XAXIS mygrid
yes? FILE/FORMAT=stream/SKIP=1003/GRID=mygrid/VAR=var2 myfile.dat

The argument 1003 is the sum of the 1000 data words in record 1, plus 2 words of record length information surrounding the data values in record 1 (variable var1), plus 1 word of record information preceding the data in record 2.

When reading from stream or binary files, the entire grid is read when the data is requested. To read subsets of the data, it may be possible to define a smaller grid to read a subset of records and perhaps write that out to a netCDF file, the do a second read, skipping those first records, and so on.

2.4.1.3 Fortran binary files, variables on different grids.

Some FORTRAN-structured files have multiple variables per record which do not share a common grid. An example would be one year of a global monthly field stored as twelve records like this:

rrrr year month field(360x180) rrrr

The data file size is (1+1+1+360*180+1)*12*4 = 3110592 bytes. Such a file cannot be read with the /FORMAT=unf qualifier but can be read with the /FORMAT=stream qualifier described in the next section. By including the /SWAP qualifier, this technique can be used to read files created on a machine with a different byte ordering.

The following commands will read this file and assign the data to the appropriate grid:

yes? ! Create an X axis for an entire record.
yes? DEFINE AXIS/X=1:`3+360*180+1`:1 binary_x
yes? DEFINE AXIS/T=1:12:1 binary_t
yes? DEFINE GRID/X=binary_x/T=binary_t binary_g

yes? ! Read in everything.
yes? FILE/FORMAT=stream/G=binary_g/VAR=val binary_file

! Create the grid for the data field.
yes? DEFINE AXIS/MODULO/X=0.5:359.5:1 1deg_x
yes? DEFINE AXIS/Y=-89.5:89.5:1 1deg_y
yes? DEFINE AXIS/T=15-jan-1999:15-dec-1999:1/UNITS=month month_1999_t
yes? DEFINE GRID/X=1deg_x/Y=1deg_y/T=month_1999_t 1deg_1999_g

yes? ! Create a variable that uses this grid.
yes? LET dummy = x[GX=R_1deg_1999_g] + y[GY=R_1deg_1999_g] + t[GT=R_1deg_1999_g]

yes? ! Reshape the data portion of val onto the data grid.
yes? LET field = RESHAPE(val[i=4:`3+360*180`],dummy)

2.4.2 Stream binary files

Files without embedded record length information are created by FORTRAN programs using ACCESS="DIRECT" in OPEN statements and by C programs using the C studio library. These files can contain a mix of integer and real numbers. The following types can be read from an unstructured file:

FORTRAN	C	Size in bytes

INTEGER*1	char	1
INTEGER*2	short	2
INTEGER*4	int	4
REAL*4	float	4
REAL*8	double	8

2.4.2.1 Simple stream files

Suppose "dddd" represents a 4-byte data value. Then a stream (or "direct access") binary file of FORTRAN REAL*4 or C floats is:

dddd dddd dddd dddd dddd dddd ...

The structure of the records is implied by the program accessing the data. FORTRAN code which generates a direct access binary file might look like this:

REAL*4 MYVAR(10,5)
...
C Use RECL=40 for machines that specify in bytes

OPEN(UNIT=20, FILE="myfile.dat", ACCESS="DIRECT", RECL=10)
...
DO 100 j = 1, 5
100 WRITE (20,REC=j) (MYVAR(i,j),i=1,10)
....

Use the following Ferret commands to read variable "myvar" from this file:

yes? DEFINE AXIS/X=1:10:1 x10
yes? DEFINE AXIS/Y=1:5:1 y5
yes? DEFINE GRID/X=x10/Y=y5 g10x5
yes? FILE/VAR=MYVAR/GRID=g10x5/FORMAT=stream myfile.dat

If the file consisted of a set of FORTRAN REAL*8 or C doubles, then

and the following Ferret commands would read the data into "myvar":

yes? DEFINE AXIS/X=1:10:1 x10
yes? DEFINE AXIS/Y=1:5:1 y5
yes? DEFINE GRID/X=x10/Y=y5 g10x5
yes? FILE/VAR=MYVAR/GRID=g10x5/FORMAT=stream/type=r8 myfile.dat

Note the addition of the "type" qualifier. See SET DATA/FORMAT=stream for more details.

When reading from stream or binary files, the entire grid is read when the data is requested. To read subsets of the data, define a smaller grid to read a subset of records and perhaps write that out to a netCDF file, the do a second read, skipping those first records, and so on.

Since Ferret represents all variables as REAL*8, it will read all data int 8-byte words. On writing data using the LIST/FORMAT=stream command, by default it writes data in 8-byte words.

Beginning with Ferret v7.45, the qualifier /OUTTYPE may be used with LIST/FORMAT=stream.

2.4.2.2 Mixed stream files

Ferret can read binary files that contain a mix of numbers of different type. However, a given Ferret variable can only be one type. Say you have a file containing a mix of REAL*8 and REAL*4 numbers:

The following would successfully read the file:

yes? FILE/VAR=MYDOUBLE,MYFLOAT/GRID=somegrid/FORMAT=stream/type=r8,r4 myfile.dat

while:

yes? FILE/VAR=MYDOUBLE/GRID=someothergrid/FORMAT=stream/type=r8,r4 myfile.dat

would fail.

2.4.2.3 Byte-swapped stream files

Stream files with byte-swapped numbers can be read with the /SWAP qualifier. Note that the /ORDER and /SKIP qualifiers are also available (see chapter "Data Set Basics", section "Reading ASCII files", for more details on /ORDER and /SKIP).

Search form

You are here