Silicon Dale

Spatial and temporal databases

Following the previous month's Silicon Dale column, we continue this month looking at the problems of partially missing or imprecise data, by concentrating on spatial and temporal data. A great deal of attention has been given to this by the GIS suppliers as well as by the largest of the DBMS vendors, Oracle. However, no-one seems to have produced a solution which is really satisfactory - in particular none of the commercial solutions seem to offer much in the way of 3D capabilities, and although many of the GIS suppliers tackle the precision issue, their database solutions tend to be inelegant and incomplete.

One of the simplest examples of spatial data to be handled is that of multiple logs of the same set of drillholes. Drillhole data are conventionally supplied in one-dimensional form (that dimension being the down-hole distance), though of course they refer to three-dimensional locations and can be mapped to their 'real' positions by use of drillhole survey data. If a database is unable to handle the one-dimensional case, then it is most unlikely to be able to handle the much more complex two and three dimensional situations.

Given two logs - say a geological log recorded on unequal intervals (representing the lithological units seen in the core) and a set of assays recorded on regular sample lengths, what can a relational database management system do to help us link them together. The answer is not much. There will be a common key: 'drillhole ID'. Beyond this, the downhole distances, although measured on the same scale (in relational terms, they belong to the same domain), the actual intervals are different, so the records in the two sets of data cannot be matched. It might in principle be possible to set up SQL commands using '>' and '<' relationships to build a table that matches assays with lithological intervals, but splitting records of one or other, at mismatched depths would be tricky, especially considering the number of special cases that would have to be allowed for. The solution adopted in Datamine, and presumably most other mining software systems, was to write a special-purpose program for this requirement.

This problem is serious enough for one-dimensional data where nothing is missing and all data are assumed to be known with perfect precision. In the geosciences, the more usual situation is that some data are missing (through core loss for example) and the precision is less than perfect (identifying the depth of gradational contacts) - and the database management problem becomes a little more difficult.

Let us take a relatively simple case, where the user wishes to locate a particular horizon H in a number of holes, given just a simple one-table drill-hole stratigraphic database. This horizon is identified by a particular code and corresponding FROM and TO depth data identifying respectively the top and bottom of H. Perhaps the complete horizon is found in most of the drillholes. However, in some there may be various types of incomplete data. The horizon H might be at the very start of the hole. In this case, the FROM value will be zero but will not represent the top of the desired stratigraphic unit. Similarly the interval may be the very last in a hole, and the TO value will represent simply the depth to which the hole was drilled. Or H may not be present at all in some holes. However, this does not mean these holes contribute no information on it. If a horizon J known to lie above the required interval H occurs in the hole, say as the last stratigraphic unit found, then it is known that H lies at a greater depth than the bottom of the hole. A record could be returned for H with the FROM and TO values set to "greater than hole depth". Similarly if horizon G known to underlie H is found at the top of the hole, then a record could be returned for H in this hole with the FROM and TO values both set to "less than 0" (i.e. lying above the collar elevation).

Of course, no database management system can be expected to supply the geological knowledge necessary to make such inferences. However, if the geologist makes such inferences, and records a full set of data for H derived from the drillholes, with the appropriate partial (greater than or less than) data included, the DBMS ought then to be able to make sense of such data. 'Sense' in this context means the ability to draw correct inferences based on the (incomplete) information, which is available, or which can be derived from the data. Unfortunately this is not the case with any existing commercial system that I have encountered.

The problem is that we can have data which may be (or appear to be) complete and precise - such as an exact depth - alongside other data which may be completely or partially missing. We wish to use a database engine to draw inferences - and extract retrieved data sets - based upon operations on these various sorts of data. The operations include but are not limited to conventional database table manipulation. The results will include both known and unknown values.

Needless to say, the universe of database management systems based on SQL is totally inadequate to such a task. C.J. Date compounds the problem when he advocates banning all NULL values from databases and restricting DBMS processing to two-valued logic - returning only TRUE or FALSE answers. He argues that this matches the real world. Clearly he is not a geologist. The alternative approach advocated by E.F. Codd is to use a system of 'marks' to identify different types of 'null', and I have proposed extending this idea to allow for different types of partially missing data (see ESCA v.16 no.1). Given a set of truth tables - some of which would be standard and some of which might be defined by applications or by the user - it would in my view be possible to handle all of the required inference processing. The pre-requisite is a completely new database management system. It is not merely probabilistic reasoning that needs to be allowed (there are experimental DBMSs already based on fuzzy logic), but also reasoning that allows for incomplete data of the types discussed in these articles. As in the drillhole example above, the incompleteness may be well defined and deterministic as far as the DBMS is concerned.

Codd, E.F. 1990: The relational model for database management: version 2, Addison-Wesley, p.203-204.
Date, C.J., 1995: Relational Database Writings, 1991-1994; Addison-Wesley, chapter 9, p.341- 362.
Date, C.J. and Darwen, H. 1998: Foundation for object/relational databases: the third manifesto, Addison?Wesley, 496pp.

Stephen Henley
Matlock, England

Copyright © 2003 Stephen Henley
Spatial and temporal databases: Earth Science Computer Applications, v.18,no.6,p.1-2