The Indian Ocean Project is aimed at building a high quality Upper Ocean Thermal dataset which can ultimately be used in construction of a new fine resolution (x,y,z) Indian Ocean Climatology. This should allow close examination of the seasonal cycle of storage and advection in the upper ocean, and in particular the cycles of equatorial and boundary currents. Such a climatology could help provide a more accurate basis for model intercomparisons.
It should also allow exploration of interannual and decadal changes by extending the TOGA time series backwards in time before 1983. The 15 year TOGA record is still too short to examine possible links between the Indian Ocean and the Pacific as it only covers 3 ENSO's (86/87, 91/92 and 97/98).
Given that the primary goal of the Indian Ocean Project is to create a high resolution seasonal (and, if possible, decadal) climatology of Indian Ocean Upper Ocean Temperatures, the first stage is to collect all available data and ensure that it is of the highest possible quality before analysis.
Initially, the focus will be on the TOGA/WOCE line IX1 (Fremantle, WA to Singapore). Current boundaries are 70 to 0 degrees south Latitude and 90 to 145 degrees East Longitude. This will allow development of the techniques which will then be extrapolated to the rest of the Indian Ocean and then to the Tasman Sea. We plan to construct a set of procedures which will allow easy QC and climatology development for any area of the world's oceans.
We now have 10 datasets for inclusion in the master database. Presently, six datasets have been assembled. This master data set will be used for development of the initial climatology and refinement of the statistical screening (see below).
A complete description of the processing flow and programs used for each stage can be found here.
Most of the datasets data sets appear to have considerable overlap with others. Therefore the first stage of the project has been to build a duplicate checking process to ensure that most (if not all) of the true duplicate profiles are excluded from the final dataset. Given that the data comes from so many sources, this is not a simple task. There are often low resolution copies of higher resolution casts in the dataset which are hard to identify as such because of subtle differences in position and/or date and time. Manual checking of near-duplicates has shown that thethe level of duplication is low, and we have decided to ignore them.
After duplicate checking and database construction has begun, the best quality dataset (the 6 datasets combined so far) will be used to develop and test automated QC procedures based on statistical analysis and detection of outliers. Three programs will be used to detect data errors.
A table of the known malfunctions and data errors can be found here,
along with probable ways to detect these errors.
Following identification (and exclusion) of as much bad data as possible,
it will almost certainly be necessary to manually check some casts where
the statistical process cannot distinguish between real features (large
inversion, for example) and malfunctions (moderate leakage). QUEST
will be used to examine those casts in the context of their near-neighbours.
Progress
to date has been summarized. The next step is to check another
month with particular attention to whether or not it will be possible to
"catch" the remaining "bad" data...