Further details of the files, etc used to test the auto QC system...

These procedures are being tested using a combined data set created by extracting all data in the boundaries:

0N, 70S, 90E, 145E

The data came from a number of sources.  The base data set was the CSIRO data  - here referred to as Mership and given a priority of "1".  The next data sets added were the WOCE data which have been QC'd by the Indian Ocean Data Assembly Centre, both here at CSIRO and at JAFOOS.  These were given a priority of "2" to reflect the fact that data come from outside sources (all CSIRO data is eliminated before the WOCE sets are added using the duplicate checking programs).  The latest data from the Bureau of Meteorology was given a priority of "2" since some of the QC has not been rechecked. We also added the far seas fisheries data because it contains data from areas with otherwise sparse coverage (priority ="7") and the WOA-Observed level data for similar reasons (priority also = "7").  We have more data to be added but have decided to begin program development with these basic datasets or we would still be assembling data next year.

To check the performance of the programs, an entire month was checked by hand.  We chose April because it happened to be the next one in line for screening.  All percentages are relative to the 5781 casts in this subset of the larger data file (which contains 94471 casts so far).  Numbers of casts in a particular subset change as duplicate checking procedures are refined and more datasets are added (or the area contained changes).

The next step is to refine the statistics slightly (particularly with the trean=tment of bottledata which has been a problem because of its low resolution) and rerun the progams, then check May.  This should be completed early next week.