VMINE is based on the same set of design principles as the DATAMINE system.
DATAMINE was developed on a simple set of design principles.
These evolved from those used for the
development of G-EXEC by a team at the
British Geological Survey
in the 1970s (led by
Professor Keith Jeffery, and of which I was one of the central members), and before that by
Dr T. Victor Loudon when he developed the Rokdoc package at the University of Reading in the 1960s.
The G-EXEC design principles, adopted for DATAMINE, which remain just as valid today,
include the following:
- GENERALITY - Data Independence: in principle any program in the system can make sense of any
data file presented to it (at least sufficiently to determine whether the data are appropriate to the
task the program performs). This is achieved through the "self-describing file" which is
a table with a minimal metadata header that identifies the file contents. The DATAMINE file
has an interesting history. It was modelled essentially on the G-EXEC "G-STAR" file.
However, when the system was initially developed on 8-bit micros running the CP/M operating
system and a vanilla-flavoured Fortran compiler, the practical constraints made it
necessary to pack DATAMINE logical files into large physical files with fixed pre-defined
names. This necessitated a fixed page structure for the files, which in turn imposed limits
on the numbers of fields which could be used. Within a few years, with porting of
DATAMINE to the IBM PC, the hardware and software constraints were eased, and DATAMINE
files could be stored as separate physical files - just like the G-EXEC files they had evolved from.
However, the page structure was retained, and remains to this day in the .DM file.
Another reason for the page structure was to allow
the multiple buffering sub-system which was coded in the early DATAMINE to provide
reasonably fast and efficient disk reads and writes. Both reasons have long since fallen away.
A golden opportunity for defining a new and better file structure would have been with
the development of the first "DATAMINE Studio" version in the mid 1990s. It was one of the
most important items on my own to-do list when I left the company in 1993.
- PORTABILITY - to different machines and operating systems: this is achieved by
writing as far as possible in a standard machine-independent language. Originally this was
standard FORTRAN-66 but as FORTRAN-77 became available it made sense to migrate to this. The
machine-dependent coding was kept to an absolute minimum, and was done as far as possible in
either a system-specific FORTRAN or, only where absolutely necessary, in C. Today it would be possible
to dispense altogether with the C language modules and code the entire system in a more modern
FORTRAN 95 or FORTRAN 2003.
- INTEGRATION: all the programs within the system are engineered to have the same
'look-and-feel', whatever their origin or function.
- EASE OF USE - Command Level Consistency: Each logical task is a single callable "process program".
This defines the "atomic level" at which the user interacts with the system. This is made clear in the
Such a task would be for example, to do a relational join on two files to produce a new file,
or to create a kriged block model from a data file. Higher level tasks are carried out through macros
which string together these 'atomic' process programs.
- RELATIONAL DATABASE: The central core of DATAMINE, implemented directly from G-EXEC, is a
set of relational database management functionality. This is coded as a collection of
process programs which, together with related utility programs, made G-EXEC a
powerful general-purpose scientific data processing machine. All data are held in simple
tables, in rows and columns: no complicated hierarchical or network-style data structures
are allowed within individual files.
- MODULARITY: The system was designed top-down, and was built bottom-up.
This provides a framework for clear modularity in which a subroutine or function is coded once and
once only to carry out a particular task. That task is done through only one subroutine.
This means that when an update is made or a bug is fixed, the benefits are felt everywhere.
- ABSENCE OF LIMITATIONS: One of the key design features from the beginning was that
DATAMINE should not be constrained by memory or processor capacity.
- STANDARDISATION OF PROGRAM CONTROLS: In order to run any process program, it is necessary to define
some or all of a set of files, fields, and parameters. These are
logically distinct types of entity and should never be confounded. This evolved directly from the
command structure of G-EXEC.
- UPWARD AND BACK-COMPATIBILITY: As the system develops, many
users either do not want or are unable to spend (waste?) time
on updating their tried and tested macros. Therefore, once a process program
is released, it should not be withdrawn, because this risks
crashing users' work.
It was using these design principles - together
with a large amount of pre-existing
coding from G-EXEC and elsewhere - that DATAMINE was written within two
years from February 1981 start-up to
produce a complete released saleable product by 1983, with the bulk of
the new coding done by just two people.
The system development and the underlying principles are
discussed in a series of published and unpublished papers
(view and download at
For more information, please email me at