Brian O'Shea's home page

Description of the TAXA System

[This description originally appeared in The Bryological Times 73:1-2 (1993), but has been updated (last update 16 July 2002).]

When compiling a checklist of Malawi bryophytes (O'Shea, 1993), I thought it important that all data was held on a computer database. The database, called TAXA, was written by the author in the dBASE IV language for use under the MS-DOS operating system on an IBM-compatible PC, and is based on a program of the same name produced some years earlier to hold details of the UK and European bryophyte flora. It has been significantly enhanced subsequently and now holds the following data:

1. FAMILIES A world list of all currently used moss and liverwort families, with authorities. All families are identified by a unique 4 character alphabetic code (usually the first four characters of the family name). Both family and genus names are based on Crosby and Magill (1977), and Grolle (1983), with later additions especially from the Index of Mosses (Crosby et al., 1992, 1994 and 1997), and including families listed in Crosby et al., 1993. Fern families and genera from Derrick et al. (1987) are also included.

2. GENERA A world list of all currently used moss and liverwort genera, including the authority, and the family to which each genus belongs. All genera are identified by a unique 5 character code, based on the genus name.

3. SPECIES/SUBTAXA A file of all bryophyte taxa so far required by the system (currently numbering over 20000, including many synonyms). This includes all European and African taxa, as well as many others, including all tropical mosses and those occurring in America, Australia, parts of Asia, etc. The concept of synonyms is supported, and it is possible for these to be conditional (i.e. used on one list but not another), and for users entering taxa which are synonyms (e.g. when entering collection details) to swap for the owning taxon. To do this, all taxon names being entered onto the system are checked for being synonyms. Species can be linked to literature references. It is possible to hold fern, lichen and flowering plant names as well as bryophytes. Taxon data can be exported and imported between different TAXA systems. Bisby (1994) is followed for plant names, with the following exceptions: hybrids and cultivars are excluded; aggregates are not supported by name but by connections to taxa within the aggregate; the display of subtaxa does not include species authority (but could, as the data is held); classes of taxon name are not supported; homonyms are not supported; synonym use is supported extensively, but different synonym types (e.g. true vs. orthographic variants etc.) are not recognised. The Brummitt and Powell (1992) scheme of standardised authority names is followed.

4. LISTS The system is structured around a concept of hierarchical lists relating to geographical areas. For instance, the UK list of taxa is part of the European list, which is part of the world list; an entry of a taxon on a lower level list thus automatically infers membership of any higher level list(s) (and can delete any list records for the taxon held at the higher level). Each entry on a list represents a taxon found in that area, and contains a pointer to the full details of the taxon held in the species file. Literature references are used to support the occurrence of a taxon on a list (although references can also be can be linked independently to lists). There are facilities for transferring taxa between lists, and for transferring lists between taxa, for instance if one becomes a synonym of another. Distribution data can be exported and imported between different TAXA systems. Each list name has a three-character abbreviation, following Hollis and Brummitt's (1992) scheme for recording plant distributions; however, Hollis & Brummitt's levels and '3+2' codes are not followed, and all lists at whatever level have a 3 character code, the concept of 'level' already being implemented via the TAXA list hierarchy. This gives a more flexible scheme, as the hierarchy can be changed to add or remove levels. None of the additional 3-character codes coincide with any of Hollis and Brummitt's codes.

5. REGIONS The area to which each list refers is divided into Regions. These might be constituent countries (e.g. England, Ireland, Scotland, Wales), or states in the USA etc.

6. DISTRICTS Each region is divided into Districts, e.g. vice counties in the UK, counties in the USA. For instance, Malawi has 24 Districts, spread over 3 Regions.

7. LOCALITIES A central list of localities is held, so each collection can be linked to a locality via a District where it has been found, to a list. This can in turn be linked to a bibliographic reference. All localities input are checked for duplication against the existing database.

8. DISTRIBUTION It is possible to hold distribution data at 3 levels: list (=country), district (e.g UK vice counties, US states) or sub-districts, and there are transactions to manage all of these. A specific example of taxa being recorded against Districts is the UK bryophyte Census Catalogue. It is possible to list all taxa for a district (vice county), and all districts for a taxon. This can also, as with the Malawi list, be linked to lower level information (e.g. the original collection numbers) extracted from references. In order to allow tetrad or half kilometer recording, or any other gridded recording for District/county floras, vice county (=District) distribution is now linked to grid references (sub-District divisions). Transactions are available for instance allowing taxa for a square, or squares for a taxa to be listed. Data for the French Vosges has now been entered, and this uses an automated link to DMAP (as for collections) to display distribution maps. Grids without a common point of origin (e.g. as used in Germany) can also be used via a translation table. There are also transactions to look at endemism and diversity, which allow lists to be grouped and compared with each other, to ask such questions as: are there any endemics in this group of lists, which taxa occur in tropical Africa and America but not in tropical Asia, and so on.

9. COLLECTIONS Collection management is a well developed aspect of the system, and has been used for the BBS Malawi and Uganda expeditions, as well as holding normal collection book data. The collection data can be used to generate dot maps via an automated link to and from the DMAP (for DOS) and DMAPW (for Windows) programs, and can also be searched in various ways, e.g. for all material identified down to genus level, all plants for a specific genus or grid reference, etc., and labels can be produced. Collection locations can be entered as either grid references or lat/long, as well as by name. Collection data can be accessed by either a unique internal collection number, or a 'foreign' collection number (i.e. the original collector's number). Other information is also available, such as a list of localities or habitats in which a taxon was found, or its associates, or a graph of altitudinal range, or a list of all collections for a locality or grid reference. This area of the database is now being used extensively when writing up the collections from the BBS TBG expeditions of 1991 (Malawi) and 1996-8 (Uganda). Loans can be recorded, and a loan history is kept for each collection. Loans can be either for individual collections, or collections can be grouped, and in the latter case the loans will be sent as a group but can be returned either as a whole or in parts.

'Abbreviated collections' can also be held, to allow distribution data gathered from literature or herbaria to be entered on the system without the overhead of large amounts of detailed collection data being stored. This is also, like the collection data, linked to DMAPW for Windows to generate dot maps.

10. ACCESSIONS The author holds all his herbarium information on TAXA. This system links collections to a herbarium accession number, and allows this data to be queried in various ways, such as all accessions for a species, or all for a genus.   Accession labels are also printed.

11. BIBLIOGRAPHY A bibliography is held containing all relevant references in the system. Each reference can be associated with as many lists (countries) or list members (individual taxa) as appropriate, or any localities, and lists of personal literature holdings can be held. Literature references are used generally to support information in the database (e.g. as authorities for synonyms, or the occurrence of a taxon in a country). Multiple keyword search facilities are provided, and an unlimited number of keywords can be held for each reference. Nearly 7000 literature references are held on the system.

12. JOURNALS Each bibliographic reference is keyed by a journal abbreviation to a journal file, which contains full details of the journal.

13. OTHER SPECIALIST DATABASES Special databases have been provided for CHROMOSOMES and OIL BODIES. It is possible to hold chromosome data against any taxon, and list and print this data, and similarly for oil bodies. A database also exists for recording microscope slides, linked where relevant to the taxon file and collection file. A similar database for photographs is also planned.

14. TAXONOMY There are a number of facilities designed to help taxonomic work, such as the ability to record specimens being examined for a revision, to relate literature references to taxa at generic, species or subtaxon level, or to view synonyms for a taxon.

There are processes associated with the maintenance of all of this data (which is held in over 70 files), and in addition, a number of other processes for displaying, listing and searching for data, including the production of the formatted lists included in the Malawi checklist (O'Shea, 1993) and African checklist (O'Shea, 1995, 1999, 2003). The lists can be output by TAXA containing special characters that can be used by word processors to determine the use of italic, underlined and bold characters, and the formatting of the document. The final copy of both the Malawi and Africa checklists were produced using WordStar 6 (for the initial formatting) and Microsoft Word for Windows (for the final print).

There are currently over 240 menu options, together with a number of options selected at run time (e.g. what order to print the records in). Now the African checklist is published, work is continuing by adding new checklists to the database, and currently Europe, SW Asia, Africa, North America, Latin America, most of the Indian sub-continent, Malaysia, Philippines, Indochina, parts of Indonesia, Japan, China, Australasia and Oceania are loaded. Work is continuing in building up data for sub-Antarctic islands and India. Although written in dBASE, the program is compiled, so it can be distributed on its own, and doesn't require the user to have the dBASE product installed. The system is under constant development, both in the addition of new facilities to the programs, and in the addition of extra data. At the last count, the system consisted of over 30000 lines of program code, and requires around 40 megabytes of disk storage, the majority of which is the main data files, and the rest is indices, temporary storage files and program code.

Anybody who would like to use the system can have a copy of the author's system by sending a blank CD-ROM, but as the user manual is out-of-date, a degree of support is necessary, and the amount of time available for this is limited. A utility is available to swap the sample data files provided with empty files for your own use. Because of the size of the system, it is recommended that the PC should have 32Mb of RAM, and the speed is rather slow on anything less than a Pentium II. A re-write of the system for Microsoft Windows was started, but has been temporarily abandoned because of the length of time it is likely to take.

Requests for specific enhancements to the system are welcome: most changes are the result of user requests.



