Silicon Dale

The Relational Database and Geological Inference

Relational database management is founded on strict predicate logic. In other words, the rows in a relational table of data are propositions or assertions - statements of fact.

For example, given the following table:

Sample No.East (m)North (m)Elevation (m)Au g/tCu %
231301253431500.11.8
232310649521552.010.5
23332255122147trace0.5
the second row can be read as "sample 232, collected at location 3106m east and 4952m north of the survey grid origin, and at elevation 155m above datum, has been assayed for gold at 2.0 grams/tonne and copper at 10.5%". Within the limits of locational and analytical error, this is taken to be a true statement.

Multiple different tables may contain different types of information. Each table has a primary key - a column or combination of columns which is unique (different for every row), and may also have ‘foreign keys', other columns which can be used to link data of different types from different tables. The permissible ‘relational' operations (such as ‘join', ‘select', etc) on tables allow data to be combined in such a way as to yield further true statements - logical inferences - from the original data.

This is the principle. However, no implementation of a relational database management system yet provides safeguards to guarantee the integrity of databases. For example, severe problems can be caused by the presence of duplicate rows in a table, especially when updating (inserting new or deleting obsolete) data. Almost all existing systems claimed to be relational are built around the SQL language. Unfortunately this language is riddled with such flaws (as documented by Date and Darwen, 1998 and by Pascal, 2000) - as a simple example, in allowing keyless tables. The result is that any applications software that relies on integrity of an SQL database inherits such flaws. This includes many of the leading mining software products.

There is a relational engine for geosciences, partially implemented in G-Exec (at the British Geological Survey from 1973 - see ESCA vol.15, no.11) and developed further in the Datamine mining industry system (from 1981). Although not based on SQL and therefore not sharing its particular problems, it nevertheless remains incomplete - and also includes a number of powerful but dangerous non-relational data handling features. This has potentially serious implications for the integrity of data required by the Datamine user undertaking increasingly complex tasks. The problem - common to the Datamine geoscience database engine and to the SQL database engines relied upon by other systems - is that they fail to enforce the logical rigour which is required to maintain database integrity.

Users of systems such as Datamine should be aware of the potential risks to their databases arising not from physical storage problems so much as from design of the database and of the ‘macros' and ‘scripts' developed to manage it. I am writing in particular about the problems of Datamine because this is the system which I know best, though I am sure that other systems have similar, if not even more serious, problems - and certainly these comments apply to all systems which rely on SQL-based DBMSs.

Certain programs within Datamine - powerful though they undoubtedly are - are not only non- relational but if used can be positively lethal to the integrity of a large geological or mining database. While in the early years such programs were found to be very useful (allowing the user to do "anything" with his or her data) they need now to be quietly buried, while the set of relational functionality needs to be completed and made more rigorous.

The data handling functionality originally written into Datamine was intentionally powerful but it did not enforce relational design rules either for database building or for applications software development. With increasing complexity of both databases and applications, this lack of rigour (for example in not requiring every table to contain a unique primary key) can be dangerous. Unfortunately, recent development of Datamine - like other mining systems - has concentrated on applications functionality while failing to recognise that a number of the underlying data-handling capabilities actually threaten the integrity of the applications.

It is possible to fix these problems - and at the same time to improve upon SQL. Unfortunately, a software package which relies on external database management is at the mercy of the third- party suppliers. However, if not constrained by the flawed SQL standard it is possible to develop a full relational functionality - or at least sufficient of this functionality to ensure both rigorous protection of database integrity and efficient access to the data.

References:
Date, C.J. and Darwen, H. 1998: Foundation for object/relational databases: the third manifesto, Addison?Wesley, 496pp
Pascal, F. 2000: Practical Issues in Database Management, Addison-Wesley, 256pp.



Stephen Henley
Matlock, England
steve@silicondale.co.uk
www.SiliconDale.com

Stephen Henley, a member of the G-Exec development team at BGS in the 1970s, was co-founder of Datamine in 1981 and developed the relational database management functionality in the Datamine system - until further advances were cut short by his premature departure from the company in 1993.

Copyright © 2002 Stephen Henley
The relational database and geological inference: Earth Science Computer Applications, v.17,no.10,p.1-2