Semi-Automated QC of
Upper Ocean Thermal Data at CSIRO
The goals of the Indian Ocean Temperature Archive (IOTA) project are
to produce the best quality data set possible given the widely varying
quality of data sources for upper ocean temperatures. To do this,
it is necessary to automate at least part of the data screening process.
We still believe that ALL questionable data should be checked individually
by a person familiar with all the faults and features which are potentially
present in UOT data. Therefore, all data which is flagged as
questionable by the screening programs are checked by hand to determine
whether or not the data are "good" or "bad".
The screening is designed to catch a maximum of "bad" data while minimizing
the number of "good" profiles which must be examined by hand. HOWEVER
- we consider it better to err on the side of caution and so will catch
many "good" profiles in order to get most, if not all, of the "bad" ones.
This process has several steps:
step 1) Eliminate Duplicate profiles from the database.
-
We first check for casts with identical date/time/position and data_type
information. These are considered exact duplicates and the cast of
the lowest quality is eliminated.
-
If the cast passes this test, then it is compared to all other casts within
0.1 degree lat/lon. This is necessary due to the small errors in
both position and time/date which we regularly see. Temperature
values are compared in pairs. If more than 50% of the pairs
are equal, the casts are considered exact duplicates and only the highest
quality is kept.
-
Because identical casts are often present in different resolution (high
resolution XBT data vs Bathy data, for example), it is necessary to then
compare the temperatures from comparable depths. Again, if more than
50% match, the lower quality cast is eliminated.
Whenever a duplicate is found, the best copy is kept . This is determined
either by the quality of the source or by the resolution of the data.
Quality is ranked with data that has been QC'd by CSIRO of highest rank
and data from other sources of progressively lower rank. Source quality
can be overridden only by differences in data resolution. For example,
if a bathy has a higher quality source than a full resolution xbt, the
full resolution xbt is the one retained.
An analysis of this revealed that 0.04% of the casts remaining were
still duplicates. This low level will probably have minimal impact
on any current use of the data. In addition, 5 out of
1000 casts (0.5%) that were identified and eliminated as duplicates were
really unique casts. This loss is also considered insignificant since
one (usually superiour) copy is retained in the final data set.
We have two approaches to screening the remaining data
for "bad" values:
step 2) Gradient check:
-
First, we run a program that identifies unreasonable gradients in the data.
This looks both for single spikes and longer (unreasonable) inversions.
If such data is found, it is flagged for hand-checking. Any gradient
(change in depth / change in temperature) between -0.4 and 12.5 is flagged.
These critical values were derived by checking full resolution data for
failures and then calculating the gradients. Other data sets were
then screened to see how effective these values were in finding all or
most of the "bad" data and the values adjusted accordingly.
-
If a cast has failed the gradient test, it is then checked for spikes of
single values where the temperature difference is greater than 0.4 degrees
C. These are eliminated by linear interpolation and replacement of
the erroneous data value with the addition of the appropriate flag (SP)
Though this is not as reliable as other parts of the program, it is useful
because it minimizes the spikes that must be dealt with when hand checking
the failures. Restoring a value that has been badly interpolated
is simple for the very few casts where it is necessary.,
-
This program also finds wire breaks at the end of traces and rejects them
applying the appropriate flag (WB). Finally, it checks for bad values
in BATHY data (missing values that haven't been correctly caught) and automatically
replaces them with missing values, again adding the appropriate flag (BB).
-
All failures from this screening are checked by hand and bad data rejected
with the appropriate flag (this depends on the reason for the failure).
An analysis of the data caught by the gradient checking routines showed
that 83.4% of the data flagged actually contained"bad" data values.
In addition, only 0.11% of the data that wasn't flagged had gradient errors
that had not been caught by the program. These errors were so small
that catching them would have resulted in a much larger "catch" of good
data. And given that they were minor (less than 0.2 degrees C), they
should not have had much effect on most uses of the data. This level
of escape is considered acceptable.
step 3) Statistical screening:
-
Finally, we run a program set that is designed to compare parameters calculated
for each cast to a mean and standard deviation calculated for a box around
that cast. Only "good" data is checked so that data which has already
been eliminated doesn't corrupt the statistics. Values more than
3 standard deivations from the mean for a given parameter are given the
relevant flag and then checked by hand. Both parameters (for each
cast) and statistics (boxed by lat and lon) are stored in netCDF files
which can be used for other purposes.
The parameters currently being screened are:
SST,
MLD (gradient),
MLD (depth where SST - T
is greater than 1degree),
T100 (temperature at 100m),
T250,
T(Z) (temperature of a depth
surface),
Z(T) (depth of a temperature
surface),
integrated raw T (cumulative),
integrated binned T (also
cumulative, binned by depth),
integrated average T (integrated
raw T divided by depth),
DT/DZ ( the temperature
gradient by depth),
DZ/DT (the inverse of the
previous - similar to the gradient checking process).
There is a bit of redundancy here, particularly with the DZ/DT parameter
but this flag is only applied if the cast falls outside the mean by 3 standard
deviations, not if it exceeds an absolute critical value.
-
If a cast is from an area of low data density, it may not be possible to
calculate a valid mean and standard deviation for comparison (if there
were fewer than 10 casts in a box, no statistics were calculated).
In this case, the cast receives the flag "NA" (not assessed) and is also
checked by hand.
The flagged casts from this program are again checked by hand.
Analysis showed that some parameters were more powerful than others.
A surface offset flag (unreasonable SST) correctly identified errors 68.4%
of the time it was applied and in many cases, this was the only test the
cast failed. T100 and T250 were also useful in catching "bad" data
without also catching a lot of "good" data (21.6% and 54.5% of the casts
flagged, respectively). Others were less successful.
Overall, the combination of tests caught 93.1% of the casts containing
"bad" data (bearing in mind that only 10.3% of the data overall had any
"bad" data present and only 0.6% of the data had "bad" data that escaped
detection by these programs). The remaining 6.9% of the casts with
bad data will be assessed to see if we can increase the percentage caught
without unreasonably increasing the percentage of "good" data caught.
My feeling now is that these errors are minor and will not significantly
affect the overall quality of the data set.
These procedures also caught 8.7% of the data unnecessarily.
I suspect that this"cost" of catching the bad data (looking at "good" data)
is unavoidable but it may be possible to tailor the statistical screening
to allow for areas where extreme values are common (e.g., the Southern
Ocean). Unfortunately, these areas are also the areas where the instruments
fail most often. Checking data in these areas is relatively quick
for an experienced operator and is probably worth the time to ensure that
a maximum of "bad" data is eliminated.
Further
details of the validation and testing procedures can be found here.