I have run some tests of the gradient checking routines and the results
are as follows:
1185 records were flagged out of 9447 in the January master file (12.5%).
I then hand checked 100 of these records to determine how many were really bad and how many were really good data. 40% were actually bad data, 55% were OK when checked (mainly in the southern ocean and there isn't much I can do about them kicking out...). 5% were bathys with bad data in them.
Looking at it from the total angle, and extrapolating to the entire data set, 6.8% of the good data was caught unnecessarily. These numbers may shift when I do a larger analysis.
I then checked the entire file to see how many "bad" casts were missed by the gradient check. I looked at 1764 records which brought me to the end of the 100 records I had checked that had kicked out (they're obviously not evenly distributed through the file...). Of the 1764 records checked, 6 had unflagged bad data but only 2 of these were gradient problems or catchable by a gradient check.
In order to get these remaining 2 records, I would have to be so stringent in my trap that I would get an immense amount of "good" data as well.
SO - the bottom line is that the gradient check is catching 95.24% of the bad data and the remaining 4.76% (of the data with a gradient problem) is not catchable. With luck, it will kick out in the statistical tests and, if it doesn't, it probably doesn't matter anyway.