The data acquisition and assembly components of this project are time consuming and a 3-5 year time frame rather than the 18 months was needed to incorporate all data sets. Consequently our methodology has been fast tracked to meet this extremely tight deadline.
In order to generate the species polygons, the bounds for the species were clipped with bathymetric contours generated from a number of sources detailed below.
Dataset 1 (DS1), provided by Dr Neil Hamilton of CSIRO Division of Wildlife and Ecology, was a 1:100000 series of contour lines at depths of 0, 20, 50, 100, 150, 200, 250 and 300 metres. The sizes of the contour maps range from 12 megabytes (Mb) for the coast down to 2Mb for the 300 metres. Unfortunately this dataset contained gaps in which there were no data and numerous contours which were not closed ("dangling nodes").
Dataset 2 (DS2) was a commercially available set of contour lines (GEBCO) at depths of 0, 50, 100, 200, 300, 400 and 500 metres and thereafter in increments of 500 metres. In the area of interest, except for the coastline, those contours below 200m were deemed to be unusable due to the lack of data.
Dataset 3 (DS3) was a point dataset obtained, from the Royal Australian Navy Hydrographic Office, to supplement those areas, mainly the Cape York region, which were missing in the other two datasets.
DS1 was considered to be highly accurate, however a spheroid of projection was not given and as such could be considered to be accurate at best to within +/- 200m. DS2 had a spheroid of WGS84, however, when overlaid with DS1 and a position of known location this dataset was inaccurate by approximately 500m. DS3 could not be checked as no exact position was known near its bounds.
Due to the gaps in the two contour line based datasets, it was decided to combine all three datasets into one coherent set of contour lines. Initially it appeared the best way to do this was to generate a TIN (Triangular Irregular Network) which was then used to extract the contour lines. Various subsets of the three datasets were used in order to generate tins. These produced varied results. Unfortunately due to the algorithmic difficulty involved in generating a TIN from point and line data sources even the best of the resultant TINs were less than satisfactory for the majority of areas.
It was therefore decided to use the best from each dataset and patch a segment of the TIN into those areas which were missing and considered too large to be simply connected by a line. Boxes around the areas which required patching were drawn. A contour line at the required depth was extracted from the TIN. The contour line was then clipped with the boxes and cleaned up using ArcEdit. The resultant coverage was then added to the original coverage. This allowed for a consistent/accurate contour line except for the areas previously missing which were filled with the interpolated TIN data. The contour lines chosen for the final bathymetry set were 0, 20, 50, 100, 200, 500, 1000, 2000 and 3000 metres as reasonable bathymetric information was already available for these depths.
Due to problems of storing and generating the species polygons it was decided to decrease the resolution of the contour lines. This was done by iteratively 'generalising and dissolving' the contour lines until their size became less than a predetermined level. For our purposes this was chosen to be a total coverage size of 200 kilobytes. Since generalising a coverage removed only vertices and not nodes of an arc some of the resultant coverages were found to be inconsistent with adjacent coverages. i.e. in some cases the 20m and 50m contour lines cross. It should be noted however that these problems were also apparent in the original datasets.
Since the generation of the species polygons involved clipping and erasing with the bathymetric contour lines some indication of the slope was required to indicate regions. A coverage containing regions still maintains polygon topology except that meaning is given to those polygons which are considered part of the region. That is, a region is typically a subset of the polygons. An example of this is the 0m and 20m contour lines for Tasmania where only the 0m to 20m region is considered a valid area. i.e. inside of the 0m is invalid. In order to generate regions from polygons, some indication of which polygons to include are needed and this was done by adding an attribute (flag) to the bathymetry coverages. This was one for those polygons which contained depths less than its stated depth and zero for all others.
The bathymetry coverages were then used as clip-and-erase polygons with the bounding box. All polygons in the resultant polygon coverage with a flag value not equal to zero were chosen as being part of the region.
The limitation of this technique is that it is only workable for areas near to the coast. This is because the assumption is that the maximum depth is used as the clipping polygon and the minimum depth is used as the erasing polygon. As such all data outside the maximum depth polygon are removed.
For the purposes of the string analysis (described below) a sample coastline from the mainland of Australia and Tasmania was extracted from the weeded bathymetry data. A route system for each area was then created using ArcEdit. Although the documentation suggested that the creation of a circular route system was possible further investigation revealed that a problem existed in determining a point of reference. This made it necessary to 'cut' the arc in order to define the start of the route system. ArcEdit was used to select the arcs and create a route system. The point chosen for the mainland was near the tip of Queensland, just to the west of Cape York. For Tasmania a cut in the Tamar River was chosen as the origin.
The output from the string generation consisted of lines containing the species number, start position and end position. There were multiple lines for some species with disjoint distributions, or where the bathymetry contour line cut the box in more then two places. Species which spanned the 'cut' were shown as two lines with the ending point of the first line being the length of the coastline and the starting point of the second line being zero.
The initial algorithm used an aml script to determine the intersections of the coastline with the bounding box defined in the Oracle database. These intersection points which were in units of degrees along the coast were then written out to a file. Any polygons which did not intersect with the coast were logged. The species from the BioTax '96 Workshop were handled in a similar manner except that the coverages used for the intersection were entered interactively and as such were not restricted to boxes.
Compilation:
The GIS currently comprises data gathered on Australian fish distributions represented by ranges and points (which can be displayed on map templates) which were captured in the form of latitude and longitude co-ordinates, depths, other information associated with the records, and record sources (see below). The initial data were entered into an Excel spreadsheet. The information was then periodically written out to a delimited text file and then imported into Oracle. Species are tied to relevant information by a numerical coding system (CAAB) used broadly by Australian fisheries agencies. There is potential for expanding the GIS to incorporate other types of information relating to species or for using captured information for other purposes.
Currently the database contains 7540 polygons and 3229 point records, representing a total of 3007 distinct species. These data was assembled through examination of the information contained in scientific literature selected by Peter Last and Martin Gomon. A reference list for this literature appears in section 19 of this report.
For each literature record of a species' distribution, information was entered into the database in the following fields:
The information which has been used to create the distribution, and the fields are as follows:
id record identification number in the Oracle database
name species name
species species number from the CAAB list
ref identification number and author(s) of the source for this
information (e.g. journal article, book) (please find enclosed
list of references)
lata latitude of point A in degrees.minutes (see diagram below)
longa longitude of point A in degrees.minutes (see diagram below)
desca description of point A (e.g. Kangaroo Island, NETas)
gradea reliability of point A (1=lat/long given in source, 2=lat/long
description given in source and lat/long derived from gazetteer,
3=no lat/long or description given, so coordinates chosen
arbitrarily)
latb latitude of point B (see diagram at right)
longb longitude of point B (see diagram at right)
descb description of point B
gradeb reliability of point B (as for gradea)
depthmin minimum of depth interval, in metres
depthmax maximum of depth interval, in metres
ecol ecological classification (e=estuarine, cm=coastal marine,
sd=shelf demersal, sp=shelf pelagic, sld=slope demersal,
ep=epi-pelagic, mb=meso-bathypelagic, a=abyssal)
disttype distribution type (0=normal, 1=larval/juvenile, 2=reproductive,
3=extralimital)
remarks any queries or problems with source or information (for example,
depth? in this field indicates that the depth range is considered
suspect)
lastmod date of most recent modification of this record
mod what the most recent modification consisted of
quality not used
In most cases we have more than one reference or source of information for each species.
Points A and B are used by the GIS software to draw a rectangle (see above diagram). All areas within the rectangle within the specified depth interval form a polygon which represents all or part of the species' described distribution. Sometimes the definition of several rectangles was necessary to describe a species' distribution.
Distribution maps represent the geographical range of a species at one moment in time (Miller, 1994). Mapped distribution is the most complete possible, incorporating larval, reproductive and migratory ranges in the rare cases in which such information is available. Consequently, individuals of the species may be absent from parts of the mapped distribution at certain seasons/years/life stages. In addition, biodiversity hotspots may be temporally and spatially dynamic, following upwelling events or seasonal changes in species distributions (Yatsu, 1995).
Reliability scores were added either by the taxonomy group as a whole during the BioTax '96 Workshop or later as part of a cooperative effort on the part of Peter Last, Martin Gomon and Patricia Kailola.
The information stored for the point distributions are as follows :-
rec Oracle record number srcfname Name of source species Species number from the CAAB list id Reference identifier used by museum/source latitude longitude mindepth Minimum depth maxdepth Maximum depth (=mindepth for most cases) remarks type Type of record (0=Unknown,1=Museum,2=Survey,3=Literature)
A list of potential Workshop participants was compiled to cover Australian-based scientists with significant expertise in fish taxonomy and distribution or with strategic involvement in the Bioregionalisation Project. Due to financial constraints, the provisional list of 45 had to be reduced to 16, the foremost selection criteria being the ability to provide maximum input into the refinement of species distributions and the possession of skills required for an efficient operation of the Workshop. The list of Workshop attendees is inside the cover of this report.
Participants were responsible for assisting in determining the primary and secondary geographical distributions for species occurring within the four coastal ecosystems inshore of the continental shelf break. Some vetting of distribution maps was also required before the Workshop.
The number of species distributions vetted and refined from the BioTax '96 Workshop totals 1036 species with 175 of these species having breeding range information entered. The bulk of the species distributions are for the coastal and shelf bands (4925 records for coastal marine band, 1978 for shelf demersal, 499 for shelf pelagic). Reliability scores were also assigned to each species, rating the probable accuracy of the described distribution.
Data reduction process (A,B,C,D lists)
The method by which the Key List of species was selected for the workshop is diagrammatically illustrated in Figure 10-1.
From the data set of ca. 3200 species, a rapid assessment technique using an information index based on genera (rather than species) provided relative scores of "potential importance". These lists were divided into ecosystem lists where only the genera represented in each of the 4 ecosystems (estuarine, coastal, shelf demersal, pelagic) were ranked. From these lists the "top" 300 species were collated into a "priority list" of about 900 species. From this list the "well-defined species" list (list D) was extracted, as those with high reliability scores were felt to not need examination at the workshop due to time restraints. From the "poorly-defined species" list those groups which probably could not get more information added to them at the workshop (such as the sharks and rays) were extracted (into an "excess key species list" (list B)). The remaining "poorly-defined species" were combined with species from the "String List" (list C) which was about 350 species which had been selected out from the original data set due to their narrow ranges. The result was the final "Key List" (list A) which was covered in the workshop.
Figure 10-1 Diagrammatic representation of the reduction for the data list of polygons used to preselect species for consideration by the BioTax '96 participants.
These reliability scores (c.f. 8.1) were subsequently used as a further selection criterion in grouping species for the purposes of bioregional analysis.
Genera were ranked from lowest score to highest within each of the four ecosystems, and the resulting priority list was used in concert with a listing of all polygons in the database to determine (i) for which species and genera more distributional information was required, and (ii) important species for scrutiny by experts at the BioTax '96 Workshop.
The polygon refinement as part of the workshop involved manual modification on the paper maps with the modifications being entered into the GIS as part of the post-workshop task.
Refinement of the polygons was done with an aml script and menus. Although it can be thought of as editing, at no stage were the original polygon coverages modified. Two methods were evaluated to handle the refinement. Both involved the user interactively creating a bounding polygon which was then clipped with the appropriate bathymetry contour lines in order to generate the final polygon.
The first was to create complete new polygons whilst the editing was being done. The advantage to this is that the user gained some feedback as to possible errors in the bounds of the data. The main disadvantage was that this was time consuming.
The second method was to store the bounding polygon which was entered and re-create the polygon coverage at a later date. The obvious advantage was the removal of the clipping and erasing operations which allowed significant speed improvements. Also since the majority of the polygon entry involved crossing the coastline the generation of data for the string analysis was significantly less complicated.
Polygons which were entered were named with a serial number so that multiple entries could be manually generated if necessary. This removed the possibility of an error removing a previously valid polygon. As such, a requirement of the second approach was to maintain links with the coverage being 'edited' and its attributes. This was accomplished with an ascii logfile which detailed those operations which were entered and the polygons to which they applied.
The logfile contains the following fields :-
B Species Number C Initial Coverage Filename ZD Date Of Entry I Oracle ID Of Selected Record R Remark/Comment Field G Name of Coverage To Use In Generation Of Polygon N Minimum Depth X Maximum Depth D Name of Coverage To Use In Deletion Of Polygon ZB Breeding Range Polygon (True Or False) F Finished Species Number T Terminated Species Coverage
A preprocessing program was used to filter out those entries which had been terminated. Three scripts were written to convert from the logfiles into coverages.
The first was a simple union and rename. This script unioned the bounding coverages contained within each record and gave it an appropriate name. None of the original Oracle generated coverages were used in the process. This version was ideal for string analysis as it gave a bounding polygon for the species which crossed the coastline at appropriate locations.
The second script was similar to the first except that the coverages were clipped and erased with bathymetric contours before the union. This generated the polygons which could be thought of as 'final' and were ideal for viewing and mapping purposes.
The third script was envisaged to be the same as the second except that the original Oracle generated coverages were to be used as a starting point. In this scenario the user chose those records which most closely matched the final distribution and proceeded to add and delete polygons to this set. This script was not fully developed as the entered polygons were based on zero records being selected. That is, the polygons entered were to be considered independent of the original Oracle records. Time consumption in the execution of this script were also considered inappropriate due to the large amount of merging involved.
The procedure for modifications at the workshop was essentially identical for each session:
The method of vetting polygons was as follows:
Polygon ranking Future action Reasonably reliable or none better Differences of opinion none, S, GL exist Some uncertainty GL, S, SE No idea SE
(S - specialist at workshop; SE - specialist elsewhere; GL - Gomon/Last)
The modifications noted on the printed distribution maps were used for editing the polygons. Modifications were also provided by Martin Gomon, Peter Last and Patricia Kailola during the weeks subsequent to the workshop. Delegates also provided feedback after the workshop by checking their records for information that could contribute further to what had been discussed.
It is widely appreciated that information on the breeding range of a fish species is more useful than that of its total range. More mobile species can have greater extralimital occurrences than less active ones making a determination on the naturally occurring range of the species that much more difficult. Unfortunately, the breeding ranges of most marine species are ill defined or undocumented. Apart from a few commercial species, impressions are at best based on the understanding of the primary range of adult individuals.
As the reliability of known distributional and breeding range data varies dramatically between species, an attempt was made to quantify differences. Two classifications that consider the degree of reliability of distributional polygons were devised: a total range reliability index (R) to define the basis of the known distribution of species, and a breeding range index (B). All shelf species were scored according to both B and R criteria below.
Reliability Index (R) -
The broad distributional polygon of a species is based on:
Where the understanding of the distribution of a species was considered to be basic or worse (i.e. criteria 4 or 5), but where one limit of the polygon (occasionally more where disjunct distributions exist) was considered to be adequately known (or better), then the reliability score was suffixed by compass direction of the reliable limit (i.e. 'n', 's', 'e', 'w'). These boundaries provided additional information for the delineation of zootones and boundaries.
Breeding range index (B) -
The breeding range of a species was defined as either:
Disjunct distributions were indicated by suffixing breeding range scores with a 'd'.
Next Chapter: 11. Biological Regionalisation: Analytical Strategies