Semi-Automated QC of Upper Ocean Thermal Data at CSIRO
 

The goals of the Indian Ocean Temperature Archive (IOTA) project are to produce the best quality data set possible given the widely varying quality of data sources for upper ocean temperatures.  To do this, it is necessary to automate at least part of the data screening process.  We still believe that ALL questionable data should be checked individually by a person familiar with all the faults and features which are potentially present in UOT data.   Therefore, all data which is flagged as questionable by the screening programs are checked by hand to determine whether or not the data are "good" or "bad".

The screening is designed to catch a maximum of "bad" data while minimizing the number of "good" profiles which must be examined by hand.  HOWEVER - we consider it better to err on the side of caution and so will catch many "good" profiles in order to get most, if not all, of the "bad" ones.

This process has several steps:

step 1)  Eliminate Duplicate profiles from the database.

Whenever a duplicate is found, the best copy is kept .  This is determined either by the quality of the source or by the resolution of the data.  Quality is ranked with data that has been QC'd by CSIRO of highest rank and data from other sources of progressively lower rank.  Source quality can be overridden only by differences in data resolution.  For example, if a bathy has a higher quality source than a full resolution xbt, the full resolution xbt is the one retained.

An analysis of this revealed that 0.04% of the casts remaining were still duplicates.  This low level will probably have minimal impact on any current use of the data.   In addition, 5 out  of 1000 casts (0.5%) that were identified and eliminated as duplicates were really unique casts.  This loss is also considered insignificant since one (usually superiour) copy is retained in the final data set.
 

We have two approaches to screening the remaining data for "bad" values:

step 2)  Gradient check:

An analysis of the data caught by the gradient checking routines showed that 83.4% of the data flagged actually contained"bad" data values.  In addition, only 0.11% of the data that wasn't flagged had gradient errors that had not been caught by the program.  These errors were so small that catching them would have resulted in a much larger "catch" of good data.  And given that they were minor (less than 0.2 degrees C), they should not have had much effect on most uses of the data.  This level of escape is considered acceptable.

step 3) Statistical screening:

The parameters currently being screened are:

        SST,
        MLD (gradient),
        MLD (depth where SST - T is greater than 1degree),
        T100 (temperature at 100m),
        T250,
        T(Z) (temperature of a depth surface),
        Z(T) (depth of a temperature surface),
        integrated raw T (cumulative),
        integrated binned T (also cumulative, binned by depth),
        integrated average T (integrated raw T divided by depth),
        DT/DZ ( the temperature gradient by depth),
        DZ/DT (the inverse of the previous - similar to the gradient checking process).

There is a bit of redundancy here, particularly with the DZ/DT parameter but this flag is only applied if the cast falls outside the mean by 3 standard deviations, not if it exceeds an absolute critical value.


The flagged casts from this program are again checked by hand.  Analysis showed that some parameters were more powerful than others.  A surface offset flag (unreasonable SST) correctly identified errors 68.4% of the time it was applied and in many cases, this was the only test the cast failed.  T100 and T250 were also useful in catching "bad" data without also catching a lot of "good" data (21.6% and 54.5% of the casts flagged, respectively).  Others were less successful.

Overall, the combination of tests caught 93.1% of the casts containing "bad" data (bearing in mind that only 10.3% of the data overall had any "bad" data present and only 0.6% of the data had "bad" data that escaped detection by these programs).  The remaining 6.9% of the casts with bad data will be assessed to see if we can increase the percentage caught without unreasonably increasing the percentage of "good" data caught.  My feeling now is that these errors are minor and will not significantly affect the overall quality of the data set.

These procedures also caught 8.7% of the data unnecessarily.   I suspect that this"cost" of catching the bad data (looking at "good" data) is unavoidable but it may be possible to tailor the statistical screening to allow for areas where extreme values are common (e.g., the Southern Ocean).  Unfortunately, these areas are also the areas where the instruments fail most often.  Checking data in these areas is relatively quick for an experienced operator and is probably worth the time to ensure that a maximum of "bad" data is eliminated.

Further details of the validation and testing procedures can be found here.