The CSIRO netCDF/OPeNDAP interface to matlab

Version 5.11

11 September 2013

Introduction

Summary of functions

Installation

Customization

Drivers

Portability and known problems

Revision history

Alternative ways of accessing netCDF and OPeNDAP data

Disclaimer

People involved in the development of the interface

Contact details


Introduction

The CSIRO interface is used in a matlab session to retrieve data from either a local netCDF file or via an OPeNDAP/DODS server. The same commands are used for either type of access in almost every case (some small differences are discussed here).

The interface has options for automatically handling missing values, scalefactors, and the permutation of hyperslabs. It also has a simple syntax.

The method of installing the software is described here.

There are other ways of accessing netCDF files and OPeNDAP/DODS data and links to some of them are given here.


Summary of functions

Basic functions

There are ten basic functions which are commonly used. They allow users to access locally held netCDF files or to retrieve data via an OPeNDAP/DODS server.

If dealing with a netCDF file then the first argument to each function will be a file name. For example, '/home/netcdf-data/sst_cac_recon_ltm.nc' is a full file name (including a path) to a test netCDF file at the CSIRO Marine Labs. The same file is available via an OPeNDAP/DODS server with the url 'http://www.marine.csiro.au/dods/nph-dods/dods-data/climatology-netcdf/sst_cac_recon_ltm.nc'. In the examples that follow we will use this test file.

The basic functions are:

For a more detailed description of the functions and some examples just follow the links. For an introduction it is suggested that you look at the functions in the order that they are listed above. Of course documentation is also available using the matlab help facility.

Auxiliary functions

There are also some auxiliary functions which are listed below. The higher-level ones are in the main directory and you can use the matlab help facility for a more detailed description of them. You will not usually want to call these directly although some of the time related functions may be useful on occasion. There are also other functions in the private directory and these will be vary rarely needed.

Those in the main directory:

Those in the private directory:

attnc

attnc returns selected attributes of a netcdf file or DODS/OPeNDAP dataset. The general form of an attnc call is:

[att_val, att_name_list, access_function] = attnc(file, var_name, att_name, verbose, preserve_type, access_function_in);

Input arguments:

  1. file: This is a string containing the name of the netCDF file or the URL to the OPeNDAP dataset. It does not have a default.
  2. var_name: a string containing the name of the variable whose attribute is required. If it is not passed then it is assumed to be 'global' and referring to global attributes.
  3. att_name: a string containing the name of the attribute that is required. If it is not specified then it is assumed that the user wants all of the attributes.
  4. verbose: if verbose == 0 (the default) then no messages about the attributes are displayed. Otherwise some simple messages will be displayed.
  5. preserve_type: if preserve_type == 0 (the default) then any numbers in att_val will be returned as doubles. For matlab version 7.7 and higher then it is possible to preserve the original numeric type in the netcdf file and this will be done for preserve_type = 1.
  6. access_function_in: this is a string that controls the method used to read the file or opendap url. Possible values are:

output arguments:

  1. att_val: if att_name is specified then its value is returned in att_val. If att_name is not specified then the values of all of the attributes are returned in the cell att_val.
  2. att_name_list: if att_name is specified then the same name is returned in att_name_list provided that the attribute is found (otherwise it is empty).  If att_name is not specified then the names of all of the attributes are returned in the cell att_name_list.
  3. access_function: the name of the mexfile relevant to the given file and it depends on what is available. It may be 'netcdf_api', 'java' or 'none'. Of course 'none' means that we can't deal with the file.

Examples

In the following examples we use our standard OPeNDAP file test_1.nc.

      >> file = 'http://www.marine.csiro.au/dods/nph-dods/dods-data/test_data/test_1.nc';
>> [att_val, att_name_list] = attnc(file, 'u');
>> length(att_val)
ans =
10
>> att_val{1}
ans =
u,5_januarys
>> att_name_list{1}
ans =
long_name

Here we retrieve all of the attributes for the variable u. We see that there are 10 elements in each cell and that the first attribute has name long_name and is a string containing u,5_januarys.

By not giving a variable or attribute name we get information about all of the global attributes.

      >> file = 'http://www.marine.csiro.au/dods/nph-dods/dods-data/test_data/test_1.nc';
>> [att_val, att_name_list] = attnc(file, 'u');
>> att_name_list
att_name_list =
'source'
>> att_val
att_val =
'Test program'

In this case there is only one global attribute named source and it is a string containing Test program.

By giving the variable and attribute names we can get simply the value of the attribute.

      >> [att_val, att_name_list] = attnc(file, 'u', '_FillValue');
>> att_val
att_val =
1.0000e+16

A single global attribute can be retrieved by using the name 'global' in the call to attnc as below.

      >> [att_val, att_name_list] = attnc(file, 'global', 'source'); 
>> att_val
att_val =
Test program

ddsnc

ddsnc returns information about a netcdf file or DODS/OPEnDAP dataset. The general form of a ddsnc call is:

desc = ddsnc(file, access_function_in)

Input arguments:

  1. file: This is a string containing the name of the netCDF file or the URL to the OPeNDAP dataset. It does not have a default.
  2. access_function_in: this is a string that controls the method used to read the file or opendap url. Possible values are:

output arguments:

  1. desc is a matlab structure. For an OPEnDAP data set desc will contain all of the information in the DDS (Dataset Description Structure). For a netCDF file desc will be almost identical. (It cannot be exactly the same since netCDF files are not identical to OPeNDAP data sets.)

Examples

      >> file = 'http://www.marine.csiro.au/dods/nph-dods/dods-data/test_data/test_1.nc';
>> desc = ddsnc(file
desc =
variable: [1x14 struct]
dimension: [1x5 struct]

desc has 2 fields - variable and dimension. Looking at one element we see

      >> desc.variable(2)
ans =
type: 'Float32'
name: 'u'
dim_statement: {'depth1 = 12' 'depth2 = 11'}
dim_idents: [2x1 double]

The first 2 fields tell us that the variable is named 'u' and is a 32 byte float (single precision real). The dim_statement field tells us that the u variable has 2 dimensions in the order given. For dim_idents we see

      >> desc.variable(2).dim_idents
ans =
2
3

These integers refer to the dimensions of the u array. Looking at desc.dimension(2) and desc.dimension(3) we see

      >> desc.dimension(2)
ans =
name: 'depth1'
length: 12
>> desc.dimension(3)
ans =
name: 'depth2'
length: 11

That is index 2 points us to the 2nd dimension, depth1 and it has length 12. (We saw the same information in the dim_statement field earlier.) A generic program could then retrieve the information by setting:

      >> ii = desc.variable(2).dim_idents(1);

and then referring to desc.dimension(ii).



get_csiro_access_functions

get_csiro_access_functions finds the names of the access functions used in the matlab/netcdf interface. The user can change the access functions with a call to set_csiro_access_functions. The general form of a get_csiro_access_functions call is:

[access_function_local, access_function_opendap] = get_csiro_access_functions;

Output arguments:

  1. access_function_local: the access function used for local files. The acceptable values are 'netcdf_api' and 'java'.
  2. access_function_opendap: the access function used for opendap files acceptable values are 'netcdf_api' and 'java'.

getnc

Introduction

getnc retrieves data in two ways. It can be used used interactively to retrieve data from a netCDF file.

getnc is more commonly used as a function call - it can then retrieve data from both netCDF and OPeNDAP files. Because many options are available getnc can take up to 14 input arguments (although most have default values). To make things easier for the user there are various ways of specifying these arguments. Finally, a number of examples are given.

Interactive use

To retrieve data interactively the user simply types in 

>> val = getnc(file);

where file is a string containing the name of the netCDF file. From there the user is prompted for more information.

Arguments - meanings and defaults

There are 14 variables that getnc must know. Don't be frightened however as there are some easy ways to specify them and all but two have defaults. The variables are:

  1. file: This is a string containing the name of the netCDF file or the URL to the OPeNDAP dataset. It does not have a default. If describing a netCDF file it is permissible to drop the ".nc" prefix but this is not recommended.

  2. varid:  This may be a string or an integer. If it is a string then it should be the name of the variable in the netCDF file or OPeNDAP dataset. The use of an integer is a deprecated way of accessing netCDF file data; if used the integer then must be the menu number of the n dimensional variable as shown by a call to inqnc.

  3. bl_corner: This is a vector of length n specifying the hyperslab corner with the lowest index values (the bottom left-hand corner in a 2-space).  The corners refer to the dimensions in the same order that these dimensions are listed in the inqnc description of the variable. For a netCDF file this is the same order that they are returned in a call to "ncdump". With an OPeNDAP dataset it is the same order as in the DDS. Note also that the indexing starts with 1 - as in matlab and fortran, NOT 0 as in C. A negative element means that all values in that direction will be returned.  If a negative scalar (or an empty array) is used this means that all of the elements in the array will be returned. This is the default, i.e., all of the elements of varid will be returned.

  4. tr_corner: This is a vector of length n specifying the hyperslab corner with the highest index values (the top right-hand corner in a 2-space). A negative element means that the returned hyperslab should run to the highest possible index (this is the default). Note, however, that the value of an element in the end_point vector will be ignored if the corresponding element in the corner vector is negative.

  5. stride: This is a vector of length n specifying the interval between accessed values of the hyperslab (sub-sampling) in each of the n dimensions.  A value of 1 accesses adjacent values in the given dimension; a value of 2 accesses every other value; and so on. If no sub-sampling is required in any direction then it is allowable to just pass the scalar 1 (or -1 to be consistent with the corner and end_point notation). Note, however, that the value of an element in the stride vector will be ignored if the corresponding element in the corner vector is negative.

  6. order: 

  7. change_miss: Missing data are indicated by the attributes _FillValue, missing_value, valid_range, valid_min and valid_max. The action to be taken with these data are determined by change_miss.

  8. new_miss: This is the value given to missing data if change_miss == 3.

  9. squeeze_it: This specifies whether the matlab function "squeeze" should be applied to the returned array. This will eliminate any singleton array dimensions and possibly cause the returned array to have less dimensions than the full array.

  10. rescale_opts: This is a 2 element vector specifying whether or not rescaling is carried out on retrieved variables and certain attributes. The relevant attributes are _FillValue', 'missing_value', 'valid_range', 'valid_min' and 'valid_max'; they are used to find missing values of the relevant variable. The option was put in to deal with files that do not follow the netCDF conventions (usually because the creator of the file has misunderstood the convention). For further discussion of the problem see here. Only use this option if you are sure that you know what you are doing.

  11. err_opt: This is an integer that controls the error handling in a call to getnc.

  12. output_type:  This string determines the type of the returned variable.
  13. file_status:
  14. access_function: This is a string that controls the method used to read the file or opendap url. Possible values are:

Arguments - ways of specifying

Specifying up to 14 arguments to getnc can be complicated and confusing. To make the process easier getnc will accept a variety of types of input. These are given as follows:

>> values = getnc(file, varid, bl_corner, tr_corner, stride, order, change_miss, new_miss, squeeze_it, rescale_opts, err_opt, output_type, file_status, access_function_in);

>> values = getnc(file, varid);

If you want non-default behaviour for one or more of the later arguments then you can do something like:

>> values = getnc(file, varid, -1, -1, -1, -1, change_miss, new_miss);

In this case there are 4 arguments specified and 7 with default values used.

>> x.file = 'fred.nc';
>> x.varid = 'foo';
>> x.change_miss = 1;
>> values = getnc(x);

This specifies 3 arguments and causes defaults to be used for the other 8.
Note that it is possible to mix the usual arguments with the passing of a structure - it is only necessary that the structure be the last argument passed. We could achieve the same effect as above by doing:

>> x.change_miss = 1;
>> values = getnc('fred.nc', 'foo', x);

Examples

In the following examples we use our standard OPeNDAP file "http://www.marine.csiro.au/dods/nph-dods/dods-data/test_data/test_1.nc" to illustrate the usage of getnc

The simplest command line call to make is the following:

>> file = 'http://www.marine.csiro.au/dods/nph-dods/dods-data/test_data/test_1.nc';
>> u = getnc(file, 'u');

The first argument specified is the file name or url. The second argument is the name of the variable - we could have found this by using inqnc. The result is that the entire contents of the u variable will be returned to the matlab session.

Alternatively we could have passed a structure to getnc to get the same answer.

>> x.file = 'http://www.marine.csiro.au/dods/nph-dods/dods-data/test_data/test_1.nc';
>> x.varid = 'u';
>> u = getnc(x);

We may only want a part of the variable and that is what the 3 arguments (bl_corner, tr_corner, stride) are about. If we use inqnc to consider the u variable described in our example file we see that it has two dimensions ((depth1 depth2) in that order. We say that the variable is in a 2-dimensional rectangle. We also saw that there are 12 and 11 points in each of the directions. Thus we can imagine extracting a subset of the data known as a hyperslab. The argument bl_corner specifies the bottom left hand corner of the hyperslab, tr_corner specifies the top right-hand corner and stride specifies the sampling done. An example to illustrate this is shown below.

>> u = getnc(file, 'u', [-1 3], [-1 9], [-1 2]);
>> size(u)
ans =
12 4

The 1st element in each of these arguments is -1 to indicate that we want to retrieve every point in that direction. Hence the 1st dimension of u is of length 12 – the full number of elements in the depth1 dimension. Now bl_corner(2) = 3, tr_corner(2) = 9 and stride(2) = 2. This means that in the depth2 direction we want every secondpoint from the 3rd to the 9th, i.e., points 3, 5, 7 and 9. Hence the 2nd dimension of u is of length 4.

The next argument to discuss is order. In general it is best not use this option and just use the default (-1). The option allows you to reverse the dimensions in the returned value. Since netCDF files store data in row-major order but matlab does the opposite, it is possible, in principle, to make some efficiencies when retrieving data from a local netCDF file. However this is rarely significant and the option is only retained for backwards compatibility with older versions of getnc. (For OPeNDAP files setting order = -2 is always less efficient than -1.)

The following example illustrates this.

>> u = getnc(file, 'u'); 
>> size(u)
ans =
12 11
>> ut = getnc(file, 'u', -1, -1, -1, -2);
>> size(ut)
ans =
11 12

Note that in the 2nd case we have used -1, -1, -1 for the corner, end_point, stride arguments to indicate that we want the default case of getting all possible values. We could have passed a structure to get the same result as below:

>> x.file = 'http://www.marine.csiro.au/dods/nph-dods/dods-data/test_data/test_1.nc';
>> x.varid = 'u';
>> x.order = -2;
>> ut = getnc(x);
>> size(ut)
ans =
11 12

The default behaviour of getnc is to replace missing values in the data with NaNs. (By missing values we mean those values equal to the _FillValue or missing_value attribute or outside the range determined by the valid_min, valid_max or valid_range attribute. This is discussed in the netCDF user's guide at http://www.unidata.ucar.edu/software/netcdf/docs/netcdf/Attribute-Conventions.html#Attribute-Conventions.) The pair of arguments change_miss and new_miss can change this. If change_miss = 1 then any missing values are returned unchanged. If change_miss = 2 then they are changed to a NaN (the default, also available as change_miss = -1). If change_miss = 3 then any missing values are replaced by new_miss.

This is illustrated in the following example – note that we pass a structure, x, here and have made sure that x is empty at the start.

>> x = [];
>> x.file = 'http://www.marine.csiro.au/dods/nph-dods/dods-data/test_data/test_1.nc';
>> x.varid = 'u';
>> x.bl_corner = [12 11];
>> x.tr_corner = [12 11];
>> u = getnc(x)
u =
NaN

We use the simplest version of getnc to retrieve the last value of the array – we get a NaN because the value actually stored in the dataset is marked as a missing value. Next we try change_miss = 1,

>> x.change_miss = 1;
>> u = getnc(x)
u =
3.0000e+16

Now, 3.0000e+16, the value actually stored in the file, is returned. Finally, we use change_miss = 3 to cause the missing value to be replaced by 1.5 in our matlab array.

>> x.change_miss = 3;
>> x.new_miss = 1.5;
>> u = getnc(x)
u =
1.5000

The next argument, squeeze_it, deals with singleton dimensions (i.e., those of length 1). If squeeze_it = 1 (the default behaviour) then any singleton dimension will be eliminated as if the matlab function squeeze had been applied. If squeeze_it = 0 then the singleton dimensions will remain. This is illustrated in the following examples.

>> big_var = getnc(file, 'big_var', [-1 2 2 5 -1], [-1 2 2 5 -1]);
>> size(big_var)
ans =
12 3
>> big_var = getnc(file, 'big_var', [-1 2 2 5 -1], [-1 2 2 5 -1], -1, -1, -1, -1, 0);
>> size(big_var)
ans =
3 1 1 1 12

This option is not really necessary any more because matlab has the squeeze function. It was originally put in to enable backwards compatibility with earlier versions of getnc written before matlab dealt with multi-dimensional arrays and so we are stuck with it.

From version 3.3 onwards getnc has given the user some control over error handling. In the examples below we ask for a non-existent variable. The default behaviour (err_opt == 2) returns an empty array and prints a warning message as below.

>> junk = getnc(file, 'junk')
WARNING: junk is not a variable in http://www.marine.csiro.au/dods/nph-dods/dods-data/test_data/test_1.nc
junk =
[]

Setting err_opt == 1 causes getnc to be aborted due to the non-existent variable as seen below.

>> x = []; 
>> x.err_opt = 1;
>> junk = getnc(file, 'junk', x)
??? Error using ==> getnc_s>error_handle
ERROR: junk is not a variable in http://www.marine.csiro.au/dods/nph-dods/dods-data/test_data/test_1.nc
Error in ==> getnc_s at 872
values = error_handle([], mess_str, [], err_opt);
Error in ==> getnc at 211
values = getnc_s(varargin);

Finally, can see the dangerous option err_opt == 3 which causes an empty array to be returned and no error message.

>> x.err_opt = 3; 
>> junk = getnc(file, 'junk', x)
junk =
[]

This might be used when getnc is called in a loop and you don't want to get a large number of error messages. Of course you should be careful to handle the returned values properly.


inqnc/enqnc

inqnc and enqnc are two slightly different versions of an interactive function that is used to find out about the structure of a netCDF file or OPeNDAP dataset. (In the latter case you could use a web browser for the same purpose.)

enqnc is an older, command-line driven version that some people prefer and it is described below. inqnc returns the same information but uses pop-ups to ask the user question. The general form of the calls are:

inqnc(file, access_function_in, menu_type)

enqnc(file, access_function_in)

Input arguments:

  1. file:This is a string containing the name of the netCDF file or the URL to the OPeNDAP dataset. It does not have a default.
  2. access_function_in:A string that controls the method used to read the file or opendap url. It is not required; possible values are:
  3. menu_type:this is not required and is used to specify the type of menu that the user sees.
    menu_type == 1 (the default) gives pop-up menus,
    menu_type == 2 gives command-line menus (like the older version).

Try clicking on the url 'http://www.marine.csiro.au/dods/nph-dods/dods-data/test_data/test_1.nc'  to see a typical structure. The same information is found in the matlab example below. Of course, the output from enqnc will be almost identical if we look at the netCDF file on a local disk.

Example

      >> file = 'http://www.marine.csiro.au/dods/nph-dods/dods-data/test_data/test_1.nc';
>> inqnc(file)
--- Global attributes ---
source: Test program

The 5 dimensions are 1) dim_unlmited = 3 2) depth1 = 12 3) depth2 = 11 4) dim3 = 3 5) dim4 = 4.
dim_unlmited is unlimited in length

----- Get further information about the following variables -----

-1) None of them (no further information)
0) All of the variables
1) time 2) u 3) ureverse
4) uchar1 5) uchar2 6) uchar3
7) ushort 8) ulong 9) udouble
10) no_atts 11) big_var 12) depth1
13) depth2 14) dim3

Select a menu number: 1

--- Information about time(dim_unlmited ) ---

*units: days since 1990-1-1 00:00:0.0 *long_name: Time

----- Get further information about the following variables -----

-1) None of them (no further information)
0) All of the variables
1) time 2) u 3) ureverse
4) uchar1 5) uchar2 6) uchar3
7) ushort 8) ulong 9) udouble
10) no_atts 11) big_var 12) depth1
13) depth2 14) dim3

Select a menu number: 2
--- Information about u(depth1 depth2 ) ---

*long_name: u,5_januarys *units: cm/sec
*ml__FillValue: 10000000270000000 *missing_value: 10000000270000000
*valid_range: -10000000270000000 10000000270000000
*test_double: 100 2000 *test_short: 25 -3 19
*test_long: -4 333 -17 *scale_factor: 3
*add_offset: 0.5

----- Get further information about the following variables -----

-1) None of them (no further information)
0) All of the variables
1) time 2) u 3) ureverse
4) uchar1 5) uchar2 6) uchar3
7) ushort 8) ulong 9) udouble
10) no_atts 11) big_var 12) depth1
13) depth2 14) dim3

Select a menu number: -1

putnc

putnc is a recent addition. It closely corresponds the getnc - it writes a variable to a netCDF file instead of reading it. It has no output arguments and takes the same input arguments as getnc - described here. The arguments output_type, file_status and access_function_in are ignored.
A convenient way to create a netCDF file would be as follows:
  1. Run "ncdump -h" from the command line to create a cdl file.
  2. Edit the cdl file to have the dimensions and attributes that you want.
  3. Run "ncgen -b" from the command line to create an empty netCDF file.
  4. Use putnc from within matlab to write the variables into the file.

set_csiro_access_functions

set_csiro_access_functions sets the default access functions used in the matlab/netcdf interface. The user can find the existing values with a call to get_csiro_access_functions. A change of access functions may be made to get around known problems with one of the drivers. The general form of a set_csiro_access_functions call is:

set_csiro_access_functions(access_function_local, access_function_opendap)

Input arguments:
  1. access_function_local: the access function used for local files. The acceptable values are 'netcdf_api' and 'java'.
  2. access_function_opendap: the access function used for opendap files acceptable values are 'netcdf_api' and 'java'.

set_getnc_repeats

set_getnc_repeats is a recent addition. It specifies that if a call to getnc fails it should be repeated at regular intervals. This was written because we have observed intermittent failures with opendap calls.
The general form of a set_getnc_repeats call is:
set_getnc_repeats(num_repeats, pause_interval)
Input arguments:
  1. num_repeats: the number of times to repeat the call the getnc before giving up. The default is zero, i.e., there will be no repeats.
  2. pause_interval: the interval (in seconds) between repeat calls to getnc.

More details

The control of the repeats depends on the global structure CSIRO_getnc_error_handling. You can access this global variable by typing the following in matlab:
>> global CSIRO_getnc_error_handling
The variable CSIRO_getnc_error_handling.num_failures tells you the total number of failures since set_getnc_repeats was last called (when the field num_failures is reset to zero).


timenc

timenc finds the time vector and the corresponding base date for a netCDF file or DODS/OPeNDAP dataset that follows the CF conventions (or the older COARDS conventions). In practice this means that time-like variable should have a units attribute of a certain form. An example is:

'seconds since 1992-10-8 15:15:42.5 -6:00'.

This indicates seconds since October 8th, 1992 at 3 hours, 15 minutes and 42.5 seconds in the afternoon in the time zone which is six hours to the west of Coordinated Universal Time (i.e. Mountain Daylight Time). Instead of 'seconds' the string may contain 'minutes', 'hours', 'days' and 'weeks' and all of these may be singular or plural; they are not case-sensitive.

The time zone specification can also be written without a colon using one or two-digits (indicating hours) or three or four digits (indicating hours and minutes). The letters 'UTC' or 'UT' are allowed at the end of the string, but these are ignored. Or the time zone may be entirely omitted.

Over the years parsetnc (the matlab function that actually parses the string) has been extensively modified so that it can handle many variations of the unit string. These are not documented but are not believed to have any bugs in them.

The general form of a timenc call is:

[gregorian_time, serial_time, gregorian_base, serial_base, sizem, serial_time_jd, serial_base_jd] = timenc(file, time_var, bl_corner, tr_corner, calendar)

Input arguments:

  1. file: This is a string containing the name of the netCDF file or the URL to the OPeNDAP dataset. It does not have a default.
  2. time_var: the name of the 'time' variable in the netCDF file or DODS/OPEnDAP dataset.  If this argument is missing then it is assumed that variable  name is 'time'. If time_var is multi-dimensional then it will be handled as if it had been reshaped as one 'giant' vector.
  3. bl_corner: This is a vector of length n specifying the hyperslab corner with the lowest index values (the bottom left-hand corner in a 2-space).  The corners refer to the dimensions in the same order that these dimensions are listed in the inqnc description of the variable. For a netCDF file this is the same order that they are returned in a call to "ncdump". With an OPeNDAP dataset it is the same order as in the DDS. Note also that the indexing starts with 1 - as in matlab and fortran, NOT 0 as in C. A negative element means that all values in that direction will be returned.  If a negative scalar (or an empty array) is used this means that all of the elements in the array will be returned. This is the default, i.e., all of the elements of varid will be returned.
  4. tr_corner: This is a vector of length n specifying the hyperslab corner with the highest index values (the top right-hand corner in a 2-space). A negative element means that the returned hyperslab should run to the highest possible index (this is the default). Note, however, that the value of an element in the end_point vector will be ignored if the corresponding element in the corner vector is negative.
  5. calendar: is a string determining the type of calendar to be used and is discussed here.
output arguments:
  1. gregorian_time: an Mx6 matrix where the rows refer to the M times specified in the 'time' variable in the netCDF file.  The columns are the year, month, day, hour, minute, second in that order, UT.
  2. serial_time: an M vector giving the serial times (in UT) specified in the 'time' variable in the netCDF file. Serial times are used by datestr, datevec & datenum. Thus gregorian_time = datevec(serial_time). Note that the 'time' variable actually contains the serial time relative to a base time.
  3. gregorian_base: a 6-vector giving the year, month, day, hour, minute, second of the base time as specified in the 'units' attribute ofthe 'time' variable. This is in UT.
  4. serial_base: the serial time of the base time, in UT, as determined by matlab's datenum function. Thus gregorian_base = datevec(serial_base). serial_base will be a NaN for times before October 15 1582, when the Gregorian calendar was adopted, since datenum is not meaningful in this case.
  5. sizem: the size of the 'time' variable in the netCDF file.
  6. serial_time_jd: an M vector giving the julian day number (in UT) specified in the 'time' variable in the netCDF file. (julian day numbers are used by get_julian_day and get_calendar_date. Thus gregorian_time = get_calendar_date(serial_time_jd).
  7. serial_base_jd: the Julian day number of the base time, in UT, as determined by get_julian_day. Thus gregorian_base = get_calendar_date(serial_base_jd).

Calendars:

It is possible to have many different types of calendars but timenc only implements five at present.

These are necessary because there is some confusion with dates before October 15 1582 when the Gregorian calendar was introduced. A problem also arises when the reference date in the units attribute is before this. timenc deals with this by recognising some of the CF conventions and returns different answers depending on the value of the calendar attribute of the time-like variable. Also, some numerical models like to pretend that every year has the same number of days - 365, 366 and 360 are all used.

Note that other values of the calendar attribute produce an error message. This can usually be avoided by the user specifying the calendar explicitly in the call to timenc.

Examples

In the following examples we use our standard OPeNDAP file sst_cac_recon_ltm.nc.

The simplest command line call to make is the following:

>> file = 'http://www.marine.csiro.au/dods/nph-dods/dods-data/test_data/test_1.nc';
>> [gregorian_time, serial_time] = timenc(file);

Note that since the time-like variable is named 'time' we did not even have to put in its name. We now look at the matrix that contains the gregorian time.

>> gregorian_time(1, :)
ans =
1.0e+03 *
1.9900 0.0010 0.0010 0 0 0.0000
>> gregorian_time
ans =
1990 1 1 0 0 0
1990 1 2 0 0 0
1990 2 10 12 0 0

Each row of the the matrix gregorian_time contains a time in year, month, day, minute, hour, second format. Thus the last date is for noon, 10 February, 1990. We can see the same thing by looking at the vector serial_time.

>> size(serial_time)
ans =
3 1
>> datestr(serial_time)
ans =
01-Jan-1990 00:00:00
02-Jan-1990 00:00:00
10-Feb-1990 12:00:00

serial_time gives the time in the format used by matlab's functions datenum, datevec and datestr. Thus we can use datestr to print out the last date.

Here we get the 1st and 2nd dates.

>> [gregorian_time, serial_time] = timenc(file, 'time', 1, 2);
>> datestr(serial_time)
ans =
01-Jan-1990 00:00:00
02-Jan-1990 00:00:00

whatnc

whatnc lists all of the netCDF files (including compressed ones) in the current directory. It also lists all of the netCDF files in the common data set.

Example

Below is a possible listing returned by whatnc.

>> whatnc
----- current directory netCDF files -----
bar.cdf foo.cdf mycdf.cdf test_1.nc test_timenc.nc
----- current directory compressed netCDF files -----
EMPTY
----- common data set of netCDF files -----
bath_agso_2002.nc soc_climatology.nc
bath_agso_98.nc sst.mnmean.1981-present.nc

The list under the 1st heading shows all of the files in the current directory that seem to be netCDF files. This is based simply on whether they end in .cdf or .nc. Note that the .cdf suffix was used in the past to indicate a netCDF file but is no longet reccommended.

The list under the 2nd heading shows all of the files that end in nc.gz, nc.Z, cdf.gz or cdf.Z. These are presumed to be compressed netCDF files.

The 3rd list shows netCDF files in the area referred to as the common data directory. This directory will be searched by the inqnc, attnc and getnc commands and is set by the local system manager. This is done by simply editing the pos_cds.m file.


Installation

The CSIRO interface has been installed on both unix and Windows pc systems. Installation is mostly a matter of copying the appropriate files to directories and then making them visible to matlab. Accordingly the experience should easily translate to other operating systems. Note that steps 4, 5 and 6 will improve the user experience, but are not necessary.

  1. Download either matlab_netCDF_OPeNDAP.tar.gz or matlab_netCDF_OPeNDAP.zip(the files in each are identical). Copy the downloaded file to a chosen directory (let's call it $MATLAB) and expand it using either gunzip and tar or unzip as appropriate.

  2.  The directory $MATLAB needs to be in the matlab search path. One way to do this is to use the matlab command addpath in your startup.m file. Alternatively, see this discussion of the matlab search path.

  3. The toolsUI.jar driver is required to read netcdf files with old versions of matlab (version 7.6 and earlier) and to read opendap files. Download the latest version of toolsUI.jar and copy it to the same $MATLAB directory as before.

  4. Now the directory $MATLAB needs to be in the matlab search path. You could do this by using the matlab function javaaddpath in your startup.m file. Alternatively, see this discussion of java classes in matlab.
  5. Test the installation. When the directory structure was expanded in step 1 a subdirectory named test was created. Go to this subdirectory, start matlab and type test_all. This gives you options to test both the netCDF and the OPeNDAP installations. It does that by reading some data from a supplied netCDF file or from an OPeNDAP server. The data are compared to those in a supplied mat-file. If you get to the end successfully then test_all will give you a timing message. It is interesting to see how much slower it can be to access the data remotely via the OPeNDAP interface. There are a few common errors:
    1. Part of the interface may not be visible to matlab if the matlab paths are set incorrectly.
    2. The testing of the OPeNDAP installation may fail sometimes because of dropouts in the internet somewhere. If only one of the tests fails then try repeating the exercise.
    3. In older versions of matlab the interface may fail due to a namespace clash. This results in java.io.IOException error messages that mention ucar.nc2.dataset. The solution to the problem is discussed here.
  6. Finally you can customise your installation if you want. After doing that run the test described in the previous step to check that things still work.

Customization

  1. The CSIRO interface needs to be decide how to read a given type of file. The choice is between the native netcdf api (which is available in later versions of matlab) and the java interface. The user can force it to use a particular one in the call to getnc by specifying access_function_in. The default is set by a call to the function choose_access_function.m which is in the private directory. You can edit the first few lines of this file to control the default. This is described in the help message for choose_access_function.m. For matlab version 7.6 and later the native netcdf api isused for reading local netcdf files and the java method is used for earlier versions. The java method used for all opendap files.
  2. If you have a local data set of netCDF files that you want to be accessible to matlab without the user having to specify the path name then you can edit the file pos_cds.m. This matlab function will be used by getnc, attnc, timenc, inqnc, ddsnc and whatnc when it is trying to find a given netCDF file.
  3. whatnc can print a message that is specific to a given site. Simply create a matlab script named message_for_whatnc.m and put it in the same directory as whatnc.m. After whatnc has printed its primary information it will print the local message that it gets from message_for_whatnc.m. A documented example script is included with the interface and you should edit it to fit your local requirements.

Drivers

The CSIRO interface is only a wrapper that makes it easier to get data. The actual retrieval of the data is carried out by either the native netcdf api (which is available in later versions of matlab) or the java interface which uses toolsUI.jar as described below. Note that earlier versions of the CSIRO netcdf interface were able to use other drivers (such as mexnc and loaddap) but we no longer support these as they don't add any extra functionality. To find out which drivers are used there is the function get_csiro_access_functions. The function set_csiro_access_functions allows the user to change the default drivers.

native netcdf api

  1. As a built-in feature the user doesn't need to install or maintain it.
  2. The opendap reader can deal with username:password authentication.
  1. There is a bug in reading character arrays via opendap - see here.

toolsUi.jar

  1. This can retrieve both netCDF files and data from an OPeNDAP server.
  2. It is supported by Unidata (who maintain both netCDF and OPeNDAP) and so it is likely to be up-to-date. This may be of special importance as later versions of netCDF files become more popular.
  1. It has lower memory limits than the native api – resulting in “java.lang.OutOfMemoryError: Java heap space” messages when retrieving files larger than 147 mB. (This is discussed here).

  2. It does not allow access to opeNDAP data that requires username:password authentication.

  3. It may be slower to read netCDF files than the native api (although this has not been tested).

  4. In older versions of matlab the interface may fail due to a namespace clash. The solution to the problem is discussed here.


Portability and known problems

Portability

The software in this package is entirely made up of matlab script files and works for all versions of matlab later than and including matlab 7.5 (2007b)

Bugs

There are no known bugs in the CSIRO interface although there are some problems with the netcdf api that is built in to matlab and also the java interface. There are also some common problems when using the interface.

Native api errors with character arrays

The native netcdf api can be used to read opendap variable (for details type "help netcdf" in matlab). When retrieving character arrays it adds an extra dimension of length 64. In our test netcdf file there is a variable uchar2 that is dimensioned uchar2(depth1 depth2). However if the native api reads it from an opendap server it "seems" to have dimensions uchar2(depth1 depth2 maxStrlen64). Here maxStrlen64 is actually 64. The extra 63 slices are all nulls.

This seems to be a problem with the c code in the underlying DAP libraries since the other software developers have seen the same problem. See, for example,  http://sourceforge.net/projects/nco/forums/forum/9829/topic/5474372 where the author says "In this case DAP translates scalar characters into NUL-terminated character arrays (of length 64).".

The problem does not occur with the java interface and so we will continue to use that for opendap access until the Mathworks or unidata fixes the problem. Neither body has shown any interest in doing so.

java OutOfMemoryError

When using the java virtual machine to retrieve OpeNDAP data there may be “java.lang.OutOfMemoryError: Java heap space” errors due to running out of heap space. In some tests the limit has been around 147 mB. The mathworks has a web page here explaining how to increase the limit on heap space.

opeNDAP password authentication

Using the java virtual machine to retrieve opeNDAP data will fail if the opeNDAP server requires username:password athentication. The native netcdf API will handle this properly though with version 7.14 (2012a) and later.

java namespace clash

In older versions of matlab there may be a namespace clash with the mwucarunits.jar file. This results in java.io.IOException error messages that mention ucar.nc2.dataset. You can check this by running the following code fragment

p = javaclasspath('-static');
for ii = 1:length(p)
    if ~isempty(strfind(p{ii}, 'mwucarunits'))
       disp(p{ii})
    end
end

This would typically indicate that there is a file like "/home/matlab7.6/java/jarext/mwucarunits.jar".The way to deal with this is to edit the file classpath.txt and remove (or comment out) the line that contains "mwucarunits.jar".

The default classpath.txt file resides in the toolbox/local subdirectory of your MATLAB root directory.

Reference to the "mwucarunits.jar" file is eliminated because the file contains an old implementation of the Unidata udunits package that conflicts with the more recent version that NetCDF-Java uses. (It appears that "mwucarunits.jar" is only used by the Mathworks "Model-Based Calibration Toolbox" so this should not cause a problem.)

Confusion with missing values

When reading some netCDF files getnc will return a missing value indicator (by default a NaN) in some places where there shouldn't be one. This is not due to a bug in getnc but occurs when the netCDF file is not following the attribute conventions (see http://www.unidata.ucar.edu/software/netcdf/docs/netcdf.html#Attribute-Conventions). Two relevant quotes from the documentation are:

The type of each valid_range, valid_min and valid_max attribute should match the type of its variable
(except that for byte data, these can be of a signed integral type to specify the intended range).

and

If _FillValue is defined then it should be scalar and of the same type as the variable.

To illustrate what this means and how a problem can occur consider the following extract from an example cdl file.

short airtemp(time, lat, lon) ;
airtemp:long_name = "Air temperature at surface" ;
airtemp:valid_range = -10000s, 10000s ;
airtemp:units = "degC" ;
airtemp:scale_factor = 0.01f ;
airtemp:_FillValue = 32766s ;

What has happened here is that the creator of the netCDF file has chosen to save space by storing the data as shorts (2 byte integers). The software reading the data will then multiply the add_offset of 0.01 by the integer values to produce the floating point value of the air temperature. Since the integers can take values between -32768 and 32767 then this can represent temperatures of between -327.68 and 327.67 degrees with a resolution of 0.01 degrees.

Note, however, that the valid_range goes from -10000 to 10000. Generic software interprets values outside of this range as faulty in some way and the default behaviour of getnc is to replace such values with a NaN. The creator of the file can use this to mark missing or contaminated data. Since the temperatures implied by these limits are -100 and 100 Celsius then the limits are “safe” since they represent physically unreasonable data.

This way of defining the valid_range is what is specified in the earlier quote.

A problem arises when the creator of the netCDF file misunderstands the attribute convention. They choose an “intuitive” definition of the attribute like:

airtemp:valid_range = -100.0f, 100.0f ;

Here they are thinking in terms of the true air temperature rather than the scaled version stored as integers. When getnc reads the valid_range attribute it then multiplies it by 0.01 and concludes that any temperatures outside the range of -1.0 to 1.0 are to be replaced by NaNs. Note that the same problem occurs when the file's creator makes the same error with other attributes – valid_min, valid_max, _FillValue and missing_value.

There are several workarounds for this problem. The simplest is to pass getnc the argument change_miss = 1. This will cause all values to be passed unchanged (apart from the rescaling implied by the scale_factor attribute). The disadvantage is that when very large values were used to indicate faulty data these will also be returned - in the example above you might end up with some temperatures greater than 100C.

The trickier, but more satisfactory option, it to use the rescale_opts option in getnc. It was designed to deal with errant netCDF files and is described here.


Revision history

The following is a partial history of revisions. I intend to keep it more up-to-date from version 3.0 onwards. In particular, bug fixes will be recorded.


Alternative ways of accessing netCDF and OPeNDAP data

There are a number of alternative ways of reading netCDF and OPeNDAP data into matlab. In most cases the time and computer resources taken to retrieve data will depend mostly on external factors such as internet bandwidth and disk access speed. Hence it would be surprising if one of these methods was significantly more efficient than any of the others.

The most up-to-date place to look for netcdf software is probably here . A good place for opendap software is here.


Disclaimer

This software is provided "as is" without warranty of any kind. It is covered by a general CSIRO Legal Notice and Disclaimer.


People involved in the development of the interface

The CSIRO matlab interface has been mostly written by Jim Mansbridge with some welcome input from Peter McIntosh and Rose O'Connor (all of CSIRO).


Contact details

This web page is maintained by Jim Mansbridge, CSIRO Marine and Atmospheric Research.

Postal address: GPO Box 1538, Hobart, Tasmania 7001, Australia
Phone: +61-3-62 32 5416
Fax: +61-3-62 32 5123


This page is http://www.marine.csiro.au/sw/matlab-netcdf.html.


Further details on the research of the CSIRO Marine and Atmospheric Research are available through the CMAR Home Page.

For more information contact reception@marine.csiro.au or telephone +61-3-62325222. Unless otherwise indicated all contents in these web documents are copyright © 1997 CSIRO.