9.3.1 Understanding Ferret Memory Management Concepts and Tools
With Ferret v7.2, significant enhancements were made to how Ferret manages memory. Information about memory management for Ferret versions prior to v7.2 is preserved in section 9.3.3, below.
The SET MEMORY/SIZE=<size> command tells Ferret how much memory it is allowed to use for accessing scientific grids of data. The argument <size> is given in units of MegaWords, where each “word” is an 8-byte double. Thus, SET MEMORY/SIZE=10 tells Ferret it is allowed to use up to 80 megaBytes of memory. If Ferret is unable to complete a calculation within this memory limit it will return an “insufficient memory” error message. See also Strategies for Handling Insufficient Memory Errors, below.
For memory limits not quickly addressed through SET MEMORY/SIZE, an understanding Ferret’s memory management techniques may help in devising a solution. Ferret memory usage can be thought of in three categories:
- Essential memory
- Cache memory
- Permanent memory
Essential memory is memory that is essential for the current command. For example, in a multi-argument command such as PLOT v1,v2,v3 memory equal to the sum of the sizes of the arguments v1, v2, and v3 is “essential”, because these three variables must be passed at the same time to the PLOT command. For compressing calculations, such as averages (T=@AVE) or modulo regridding (var[GT=month_reg@MOD]) the data that serves as input to the calculation is fleeting essential memory. In depth discussion of this topic follows below. For user-defined variables (those defined with the LET command) essential memory also includes ephemeral results that exist during the calculation: algebraic operands and function arguments. For example, in evaluating the definition LET C=A+B, the variables A and B must exist simultaneously, as C is computed. In evaluating the definition LET C=FCN(v1,v2,v3) the variables v1, v2, and v3 will all fleetingly occupy memory simultaneously.
Cache memory holds the left-over results of previous disk reads and calculations. Frequently Ferret will re-use these results, avoiding the need to read from disk or perform calculations, improving performance. For example if a global plot, SHADE temp, is followed by the zoomed in plot, SHADE temp[X=130e:80w,y=20s:20n], the second command will likely re-use the cached result left from the first. When the demand for essential memory grows high enough, cache memory will be deleted in order to free up needed memory space. The order in which cached memory results will be deleted is based upon the time since a given cached object was last accessed.
Permanent memory is memory occupied by the LOAD/PERMANENT command. Permanent memory will not be deleted until the user manually issues a LOAD/TEMPORARY for the same memory object, or a CANCEL MEMORY/ALL or CANCEL MEMORY/PERMANENT is issued. Designating memory as permanent can improve the performance under some circumstances.
It is during compressing calculations that the most complex memory management occurs. For example, in the computation of a 1D average such as L=1:1000@AVE, the source data required for this calculation is roughly 1000 times larger than the result. If the source data is too large to fit into memory, then Ferret must break the calculation into fragments in order to carry it to successful completion.
The nature of the algorithms that guide Ferret in breaking up the calculation are most clearly understood by looking at a concrete example. (A Ferret session showing this example is visible under the SET MODE FRUGAL entry of this Users’ Guide.) Let's consider a file variable ‘V’ with no E or F axes; a new session, where no memory is yet in use; SET MEMORY/SIZE=10 has been specified; and the given command LOAD V[i=1:1000,j=1:1000,k=1,l=1:100@ave] has just been issued. The result of this averaging calculation, L=1:100@AVE, will be a mere 1 MegaWord (1000x1000x1x1), but the source data needed to perform that calculation is 100 MegaWords (1000x1000x1x100).
The source data cannot fit into the set memory size of 10 MegaWords in a single block, so the calculation must be split into fragments for processing. Because the speed of file IO differs greatly by axis, Ferret optimizes for performance by breaking the calculation along the slowest suitable axis. Typically, the order of axes from fast to slow will be X-Y-Z-T-E-F, but SET DATA/ORDER and the ordering of ENSEMBLE and FMRC aggregations may affect this. So in our example, Ferret will first look to see if the T (L) axis is suitable for breaking up the calculation. Since T is also the axis of the average, as the calculation proceeds essential memory will be required to store two components for the averaging calculation: the sum of data values and the sum of T-axis weights. Managing two result-sized components will double the essential memory required to 2 MegaWords. Since our assumed scenario is that nothing else is occupying memory, the memory available to hold source data is 10-2 = 8 megaWords. To split 100 Megawords of source data into fragments that will fit into 8 Megawords each fragment can be at most 8/100=.08 of the total. With the L axis being 100 points long Ferret will split the calculation into fragments of length .08*100 = 8 points along the L axis. Those fragments will be sequentially processed to compute the result. The choice that Ferret has made to split up a calculation may be viewed by issuing the SHOW MEMORY/DIAGNOSTIC command after the calculation has completed.
In the preceding example, the fragment size calculation will be precisely as described if MODE FRUGAL is canceled and nothing else is occupying memory. Using Ferret with MODE FRUGAL canceled makes maximum use of the SET MEMORY/SIZE limit that has been specified, but raises the chances that Ferret will run out of memory due to other factors. By default MODE FRUGAL is SET and has an argument value of 30, meaning that 30% of the available memory will be held in reserve when calculations of fragment size are made. The User Guide section on SET MODE FRUGAL provides output of the example session discussed above to illustrate the effect of MODE FRUGAL. Holding some memory in reserve is important because in a typical Ferret session precise essential memory needs are difficult to anticipate. For example LET-defined variables require additional buffers of essential memory. (An additional factor of *0.25 would have been applied to the fragment size calculation of preceding calculation if ‘V’ had been a LET variable instead of a file variable.) Ferret memory management is adaptive; the calculation of fragment size when splitting is based upon the reduced memory that is available at the moment the decision to split occurs.
Each memory object in Ferret occupies a “slot”. Under rare circumstances, when very large number of small sized objects must be held in essential memory at once, competition for memory slots will require the deletion of cached objects in order to free up slots. The command SHOW MEMORY/FREE can be used to see the level of memory slot usage.
By the way, underneath it all, Ferret allocates memory via a malloc in c, and in Pyferret the memory is allocated using Python memory management, but the use of memory for computation is the same in both.
9.3.2 Strategies for Handling ‘Request Exceeds memory Setting’ Errors
For further understanding of the recommendations to follow see Understanding Ferret Memory Management Concepts and Tools, above. This section describes Ferret/PyFerret v7.2 and later. For earlier Ferret versions see section 9.3.3, below.
When an attempted calculation results in a “Request Exceeds Memory Setting” error, there are several actions that may yet enable the calculation to succeed:
- Increase the value given to SET MEMORY/SIZE=
- Eliminate memory occupied by LOAD/PERMANENT.
- Break up the argument list on the command. For example, replace the command SAVE/FILE=foo.nc V1,V2 with two commands
SAVE/FILE=foo.nc V1; SAVE/APPEND/FILE=foo.nc V2 Similarly PLOT V1,V2 can be replaced by PLOT V1; PLOT/OVERLAY V2
- Manually break SAVE operations into fragments.
- Example 1: exploit the netCDF record axis by replacing SAVE/file=foo.nc V[L=1:1000] with REPEAT/L=1:1000 SAVE/APPEND/file=foo.nc V
- Example 2: use the qualifiers /ILIMITS, /JLIMITS, etc. to replace SAVE/file=foo.nc V[K=1:1000] with SAVE/file=foo.nc/KLIMITS=1000 V[K=1]; REPEAT/K=2:1000 SAVE/APPEND/FILE=foo.nc V
- Avoid limits embedded inside LET definitions.
Limits embedded within LET definitions can thwart the split/gather strategy that Ferret uses to derive the result of a large memory calculation using a relatively smaller memory footprint. For example, consider a file variable, ‘v’, that has 1000 points on its time axis. Suppose we want to find the T-axis average of the square of this variable, v_squared[L=1:1000@ave]. Split/gather can function properly if the v_squared is defined using LET v_squared = v^2, but it cannot function properly if limits are embedded inside the definition as in LET v_squared = v[L=1:1000]^2
With the embedded limits split/gather would fail because in an attempt to evaluate a single fragment of v_squared -- say a fragment of length 5, v_squared[L=1:5] -- the embedded L=1:1000 limits would over-ride the attempt to acquire the smaller fragment of 5 points.
- Try to group together calculations that are on smaller dimensioned objects. For example, the expression VAR[i=1:100, j=1:100]*2*PI will make less efficient use of cpu and memory than the expression VAR[i=1:100, j=1:100]*(2*PI). The former multiplies each of the 10000 points of VAR by 2 and then performs a second multiplication of the 10000 result points by PI. The latter computes the scalar 2*PI and uses it only once in multiplying the 10000 points of VAR.
- Pre-compute intermediate values needed for evaluation of LET-defined pyramids.
Locate the memory-intensive portions of a complex pyramid of LET definitions and pre-compute these. If the intermediate result is small (e.g. the result of a huge volume average may be a single point) then LOAD/PERMANENT for this intermediate result may provide a simple solution. Else pre-compute and save these intermediate variables to disk. Either way, using the saved result to define a reduced pyramid will requires less memory. For example, in calculations that involve deviations from a long-time mean, pre-computing the long-time mean and saving this to disk for reuse may greatly reduce the peak memory needs of the calculation.
- Pre-compute the arguments to grid-changing functions.
Grid changing functions <?link?> are those for which the grid of the function result is not identical to the grids of the arguments. The split/gather strategy that Ferret uses to derive the result of a large memory calculation using a relatively smaller memory footprint can only be transferred through a function to its arguments for those axes that are inferred from the arguments. The command SHOW FUNCTION/DETAILS functionName will reveal which axes meet this test. When the split/gather logic cannot be passed through the function, it may be necessary to pre-compute the function arguments, or the function, itself.
- The RESHAPE function is a common culprit leading to Insufficient Memory errors. For effective memory management use of the RESHAPE function should be minimized. There are often alternative strategies which also shift between grids without altering data values, such as G=newAxis@ASN.
- Sometimes a higher value of SET MODE FRUGAL will allow the calculation to succeed.
9.3.3 Understanding Ferret Memory, for Ferret/PyFerret v7.1 and earlier
This section describes Ferret Memory Management in Ferret v7.1 and earlier.
Ferret indicates memory problems by issuing the error message "request exceeds memory setting" (or, in much-older versions of Ferret, "insufficient memory"). If memory is a problem running Ferret the following suggestions may help:
1) Use the command SET MEMORY/SIZE=nnn to increase the memory cache region available to Ferret. The number of words listed in the memory error message is the size of the request which failed; the total requirement for memory may be larger than shown. nnn is expressed in MWords. For operations that just load the data, nnn will need to be approximately the size in words that is quoted in the message, divided by 1.e6. If some of the current memory storage is already allocated for other variables, then the amount of memory reported is the additional amount needed for the last command. Use SHOW MEMORY to get the current setting, and then use SET MEMORY with the current setting, plus nnn. When reading non-NetCDF data (ascii or binary) files, a command to load a variable from the file will load all of the variables in the file over their whole range. The message about insufficient memory reports the amount needed for one variable, at the point where the allocated memory has been used up, so you will need a value on the order of nvars*nnn.
To increase memory every time you start Ferret, add a SET MEMORY/SIZE= command to your .ferret startup file. Caution: As an example, SET MEMORY /SIZE=200 allocates 200 * 8 * 1M = 1.6G of memory, so don't put that command in your $HOME/.ferret start-up file unless you have many gigabytes of memory on your system and regularly work with datasets of this size. Ferret v7.1 and lower will allocate and hold onto that amount of memory for the duration of the Ferret session.
2) Use the command SET MODE DESPERATE to determine the threshold size of memory objects at which Ferret will break a large calculation into fragments. A smaller argument value will induce stricter memory management but at a penalty in performance.
3) Use CANCEL MEMORY whenever you are sure that the data referenced thus far by Ferret will not be referenced again. This is particularly appropriate to batch procedures that use Ferret. This eliminates any memory fragmentation that may be left by previous commands.
4) Use CANCEL MODE SEGMENTS to minimize the memory usage by graphics (on a few X-window systems this may prevent windows from being restored after they are obscured).
5) When using DEFINE VARIABLE (alias LET) avoid embedding upper and lower axis bounds within the variable definition. Ferret cannot split up large calculations along axes when the limits are fixed in the definition. For example,
yes? LET V2=TEMP/10
yes? PLOT/K=1:10 V2
is preferable to
yes? LET V2=TEMP[K=1:10]/10
yes? PLOT V2
6) Try to group together calculations that are on smaller dimensioned objects. For example, the expression VAR[i=1:100, j=1:100]*2*PI will make less efficient use of cpu and memory than the expression VAR[i=1:100, j=1:100]*(2*PI). The former multiplies each of the 10000 points of VAR by 2 and then performs a second multiplication of the 10000 result points by PI. The latter computes the scalar 2*PI and uses it only once in multiplying the 10000 points of VAR.
7) After complex plots using viewports, use CANCEL VIEWPORTS to clear graphics memory.
8) If one has SET MODE STUPID:weak_cache, then make sure that the region is fully defined (i.e., check SHOW REGION and check the region qualifiers of your command). When the region along some axis is not specified Ferret defaults to the full span of the data along that axis and is unable to optimize memory usage.
Discussion of SHOW MEMORY outputs, for Ferret v7.1 and lower:
SHOW MEMORY/FREE
Cache memory is organized into "blocks." One block is the smallest unit that any variable stored in memory may allocate. The total number of variables that may be stored in memory cannot exceed the size of the memory table. The "largest free region" gives an indication of memory fragmentation. A typical SHOW MEMORY/FREE output looks as below:
yes? show memory/free
total memory table slots: 150
total memory blocks: 500
memory block size:1600
number of free memory blocks: 439
largest free region: 439
number of free regions: 1
free memory table slots: 149
With new text:
Discussion of MODE DESPERATE, Ferret v7.1 and earlier:
Recall that MODE DESPERATE has no effect for Ferret/PyFerret version 7.2 and higher. Ferret checks the size of the component data required for a calculation in advance of performing the calculation. If the size of the component data exceeds the value of the MODE DESPERATE argument Ferret attempts to perform the calculation in pieces.
For example, the calculation "LIST/I=1/J=1 U[K=1:100,L=1:1000@AVE]" requires 100*1000=100,000 points of component data although the result is only a line of 100 points on the K axis. If 100,000 exceeds the current value of the MODE DESPERATE argument Ferret splits this calculation into smaller sized chunks along the K axis, say, K=1:50 in the first chunk and K=51:100 in the second.
Ferret is also sensitive to the performance penalties associated with reading data from the disk. Splitting the calculation along axis of the stored data records can require the data to be read many times in order to complete the calculation. Ferret attempts to split calculations along efficient axes, and will split along the axis of stored data only in desperation, if MODE DESPERATE is SET.
Example:
yes? SET MODE DESPERATE:5000
default state: canceled (default argument: 80000)
Note: Use MODE DIAGNOSTIC to see when splitting is occurring.
Arguments
Use SHOW MEMORY/FREE to see the total memory available (as set with SET MEMORY/SIZE).
Whenever the size of memory is set using SET MEMORY the MODE DESPERATE argument is reset at one tenth of memory size. For most purposes this will be an appropriate value. The user may at his discretion raise or lower the MODE DESPERATE value based on the nature of a calculation. A complex calculation, with many intermediate variables, may require a smaller value of MODE DESPERATE to avoid an "insufficient memory" error. A simple calculation, such as the averaging operation described above, will typically run faster with a larger MODE DESPERATE value. The upper bound for the argument is the size of memory. The lower bound is "memory block size."