Aus Argo ArgoRT Doco Page


File Types of ArgoRT processing system

Sections on this page
Calibration reference Database FTP downloads matfiles netCDF files Processing Records Workfiles

Calibration reference

Reference CTD casts are stored in matfiles in cal_data/. These may be changed so that a climatology is used as the deep salinity reference.


Database - Spreadsheet

An Excel spreadsheet stores a large amount of techincal data about each float and it's instruments and deployment. Some of this information is critical to correctly decoding profiles, but much of it is just for loading into netCDF files.

.xls files cannot be read directly by ArgoRT, so whenever it is modified the main and calibration sheets need to be saved to csv files (called argomaster.csv) and argomaster_cal.csv respectively.) These files are then read by getdbase and getcaldbase respectively.

A web version of the argomaster.csv is maintained, by using web_database after any mods. If new floats are added to the spreadsheet or any status are changed, then also run web_select_float to update the pages that allow selection of the summary page for each float.

Note that the software is hardcoded for the exact structure of the spreadsheet - if that is changed then software modifications will be required!

Note that many fields in the spreadsheets are mandatory - without a sensible value a system crash may occur! There is some check on this in getdbase.m - respond to any related warnings!

Note all fields in the calibration sheet are mandatory (ones which are not relevant may have a NaN). If any fields are left empty then meta.nc files cannot be generated for that float [a warning will be issued.]


FTP download files

We automatically get an ftp download from Argos 4 times per day. Each download is collected in a text file in argos_downloads/, but the first 3 for each day are overwritten by each successor (as they are named by the year and Julian day.)

These files are kept as the primary backup, and may be reworked if it is found that extra information can be extracted from them. They should not be modified.

We should be able to do all manual intervention in the Workfiles, but much of it could be done by editing a copy of the download, as was done in the past. To faciliate this, extract_argos_msg can be used to create a file (in work/) containing a portion (using just one profile) from a download file.

Note: ArgoRT always processes the most recent file found in argos_downloads. This is another reason not to muck around in there - we should keep these files in the order that they were created.


Argo matfiles

We store the decoded data in one matfile per float, in matfiles/. Each profile is stored in a structure, in a structure array called float. The structure is created, (and described and defined for users) in new_profile_struct.

No attempt should be made to manually work with these files in Matlab! This is to protect the integrity of the files, and to ensure that all operations on the data are recorded in the workfiles and so can be replicated.

Presently, only these operations involve the matfiles:
  • Files may be created by conversion of old-format matfiles

  • Files may be created, profiles appended, or profiles overwritten, by the ArgoRT system (in process_profiles.)

  • Profiles structures or whole float arrays may be extracted to examine using getargo. There is deliberately no means to put these back into the matfiles.

  • rework_flag_set may be used carefully (to set one flag in each profile in some subset of the matfiles), if a decision is made to reprocess some profiles.

  • remove, insert, or sort profiles using matfile_edit (see below)

  • A programmer needing to add new fields or carefully repair the files may need to use fix_all_matfiles.
  • matfile_edit
    If a junk profile is somehow created in a matfile, OR room has to be made for a missed profile (ie a later one has already been added to the file), OR the profiles are somehow out of order, then matfile_edit can be carefully used to remove profiles or insert blank profiles as place holders, or rearrange profiles by time (jday(1)).

    The tricky part of the inserting is that to be of any use the inserted profile has to have a 'jday' close enough to the missed one so that it is matched with the new one when it is processed. Presently this is done by provding estimates based on surrounding profiles and checking that with the user.

    All these 3 operations have a messy by-product! Workfiles are named according to element number in matfile array, so any changes to order of profiles will break correspondance of individual Workfiles with the matfile profiles. It is messy and risky, but I think you should carefully rename workfiles to reflect the changes in the matfile.


    netCDF files

    netCDF files are used to transfer all data to the GDACs. They may also be used to pass data to some of our collegues (but generally the matfiles would be better.)

    netCDF files are generated in netcdf/ as each profile is processed by ArgoRT. The per-profile files may optionally be deleted or saved to a backup area after they have been transmitted to the GDACS. They are very easily regenerated from the matfiles.

    The "metadata", "technical" and "trajectory" files take longer to generate as they cover all profiles for a given float, so these are kept in netcdf/ and just extended with each new profile. After each extension a temporary copy is made in export/, which will be deleted after transfer.

    Transfers to GDACs are flaky, so we repeat each transfer a number of times. The number of repeats is controlled by System parameters. Profile files are sent .send_cdf_max times, and metadata files .send_meta_max times. Because metadata files are merely updated with each profile, we are not so fussed about getting them through every time, so we may set send_meta_max=1, but send_cdf_max=4 is recommended.

    The GDACs have very finicky format-checking and emails will be received if files are rejected. This is not always about format (see "First profile" in Operations page.)


    Processing records files

    The processing records files are used to
    1 capture information about events in processing a profile, for reporting on the daily webpage

    2 capture information about whole processing runs

    3 hold counters which control transmission of GTS and GDAC products

    Functions 1 & 3 are performed by an array of structures called PROC_RECORDS. Its format is created by new_proc_rec_struct.m. There is only one entry per float, so only one profile per float can be reported and controlled at one time (it is very much tailored for RT processing rather than reprocessing!)

    Function 2 is performed by a small array of structures called ftp_details. This has 5 elements - recording details of the ftp download used for the last 5 runs. Each run, this elements are shuffled down and the latest details loaded into element 1.

    Essentially there is only one records file, called Argo_proc_records.mat , and it is continually updated. However, mainly so the software doesn't get upset, a records file must be provided when doing any reprocessing. By default this is reprocessing_records.mat. If required, your own version can be created (by hand or use proc_recs_rebuild.m), and it's name passed to the processing programs. A webpage could be generated from this - but presently a little recoding would be required to stop this clobbering the real RT webpages.


    Workfiles

    A workfile is a mat-file containing the raw message for one profile. That is, it contains the 4 variables described below. Operator intervention to overcome errors in the message is performed in the workfile. The profile is then regenerated from the workfile. Reprocessing of a group of profiles or floats would be done from workfiles. In this way, previous operator interventon is not lost or overwritten.

    For profile np=30 profile_number=32, float 5900865, the workfile is called workfiles/5900865/N30_P32.mat. (Note that if the order of profiles in the Argo matfile is changed [see section "matfile_edit" above] then this naming will not match the new np number unless manually corrected.)

    strip_argos_msg keeps all useful data from each profile message, but decoded to decimal numbers, and separate into 4 different types of data in variables:

    The order of data elements in the ftp download is retained (as .lineno) so the download can be reconstructed to help determine groups of bad or irrelevant data. The rawdat lines are also labelled by Block number (.blkno).

    If any modifications are made to a workfile, a .qc field is attached to each variable. This field has a value for each line of data (0=good or 1=reject).

    A workfile is generated when a profile is processed. If a workfile already exists (eg when doing stage 2, the file will already exist from stage 1), and there is now different data, then it will be overwritten unless it has been edited. A report is issued as staff may wish to see if the new data alters the profile data that resulted from stage1 processing. If the existing file has been edited and there is new data then it is saved with extension '_A' and a is issued. This differences should be investigated - maybe there is useful new data? If not, and the old editing is still useful, then the new file could be manually removed. If the old file is edited but there is no new data then no new file is written.

    Modifications can be made to the workfile using edit_workfile, and reprocessing performed directly from it. edit_workfile provides various views of the data, allows lines to be rejected or accepted by setting flags, allows modified data to be tested, and allows a profile to be submitted for reprocessing.

    To reprocess a group of profiles or floats from workfiles, we use run_from_workfile

    Last updated 17 Nov 2006