Silicon Dale

Geo-Data Transfer Standards

Since the early days of computing there has been a requirement for data communication between one computer and another, and between one hardware/software environment and another.

The main problems can be divided into:

  1. physical hardware - hardware communication
  2. communication from one operating system to another
  3. binary or text encoding standards
  4. communication of semantic information

The means to handle hardware and operating-system levels of data transfer have always been in the realm of computer industry standards. Until the early 1980s the only generally accepted medium for such transfer was the 9-track 1/2-inch magnetic tape. This came in standard lengths of up to 2400 feet (the familiar 12-inch spools). Unfortunately, although the physical form of this medium was fixed, the format of data on it was not. Data could be organised in blocks of any length, with inter-block gaps. Within data blocks, the data actually stored could be encoded in many different ways - text using EBCDIC or BCD encoding for example, or binary using proprietary encoding methods. In the early 1980s there was a trend towards more compact media with development of the 5-1/4-inch floppy disk. Unfortunately, microcomputer manufacturers defined an enormous number of different formats for mapping data on to such disks, and it was only with the introduction of the IBM Personal Computer that some order came in the form of a de facto standard: Microsoft DOS. However, this still left the users of mainframes, minicomputers, Unix-based graphics workstations, and Apple computers out in the cold. In the 1990s this problem was largely solved with the rapid expansion of the Internet. There are still, of course huge problems in binary data transfer from one computing environment to another, but the Internet does at least provide a common medium for transfer of text-encoded data.

The problem of semantic-level transfer of geo-scientific data is one that has been of concern for very many years, and a variety of solutions have been proposed.

One of the earliest attempts was made by Dr Peter Sutterlin in 1975. In a sabbatical from his post at the University of Western Ontario, he was based for a year at the Atlas Computer Laboratory (now part of the Rutherford Appleton Laboratory) in England, where he developed the "Filematch" format for transfer of geological data between different data management systems. This was tested initially between his own system ‘SAFRAS' and the British Geological Survey system ‘G-EXEC', but then in a workshop towards the end of his sabbatical was also demonstrated to communicate between these and the USGS ‘GRASP' system, the French BRGM's ‘SIGMI' system, and the German ‘DASCH' system. Technically it was very successful. It was a text-encoding system designed for use on magnetic tapes as the transfer medium. Filematch encoded the data structure in a hierarchic numeric code which prefixed every data item, identifying its position in a data structure. Thus, since its structure was explicitly defined, data could be transferred among relational and CODASYL-standard or other hierarchic database systems. Filematch concentrated on the data structure since this was at that time the principal problem in transferring data from one system to another: the definition of data content was of much lesser concern at that time.

Starting in the 1980s, there have been a series of attempts in Australia to standardise mining data for transfer among different companies and for submission of reports to the State Mines Departments. The AMDEX/GEODEX standards developed under the aegis of AMIRA (the Australian Mining Industry Research Association) consisted of data dictionaries and thesauri which it was hoped would enable the easy transfer of geological and mining data. However, they did not include any software specifications, and although they have been very influential on subsequent developments, in themselves they did little to facilitate data transfer.

Meanwhile, in the international hydrocarbons industry, POSC was formed (the Petrotechnical Open Systems Corporation). One of the aims of POSC was to develop both standards and software which would allow oil and gas exploration data to be exchanged among its members. Its targets were very ambitious and in the early 1990s it had ample funding and made considerable headway. However, its standards were not directly transferable to the mining industry, and rapid changes in computing technology have led to increasing problems for POSC in meeting its own core objectives - ‘running hard to stand still'.

From 1992 to 1995 a European Union project was undertaken by a consortium of diverse organisations from five countries in western Europe. This project, named ‘DEEP' (Database Environment for Exploration and mining Projects) set out to develop a new software environment for handling and communicating exploration and mining data. It defined an object-oriented framework representing the logical relationships among geological and mining entities. Unfortunately, there was considerable dissension among the participants. The resulting compromise solutions adopted were inelegant and the data structures - although object-oriented - were both inflexible and incomplete.

In the mid 1990s, another Australian initiative, this time in the public sector, was the development of the CSIRO Geoscience Data Model. Like DEEP this was also defined using object-oriented principles, but CSIRO produced a much cleaner model with a well-defined and consistent data structure. It addressed a subset of the problem - the representation of geometry - but its solution for this subset was complete and elegant. This data model was subsequently used for another AMIRA project, specifically to develop a software environment for geological/mining data transfer.

OpenGIS is an international consortium (with a centre of gravity firmly in the USA) to define spatial data standards to be used in Geographic Information Systems - but relevant to any applications involving geospatial data. Its activities started in the early 1990s but recently have accelerated with the adoption of XML as a host language in which to express its standards. The CSIRO team have also adopted XML (in a project named XMML - eXploration and Mining Markup Language) and aligned their domain-specific (geology/mining) developments with the emerging standards from OpenGIS as well as the W3C XML standards committee. POSC is also now actively working on XML as a data exchange medium. It seems that at last there might be the prospect of consensus on possible standards for exchange of geoscience data. However, there remain many difficulties, not least that the power and breadth of XML allow plenty of scope for multiple incompatible solutions to the same problem.

The examples I have described are those which I have direct knowledge of - from working with them as either a developer or a user: you will see that there is a clear bias towards European and Australian developments ! I would welcome contributions of information on other data transfer initiatives which we might use to expand this review in a future issue.

Stephen Henley
Matlock, England
steve@silicondale.co.uk
www.SiliconDale.com

Copyright © 2001 Stephen Henley
Geo-Data Transfer Standards: Earth Science Computer Applications, v.16, no.6, p.1-3