Introduction Design principles R & D G-EXEC format Datamine formats VMine formats G-EXEC Contacts

DESIGN PRINCIPLES

VMINE is based on the same set of design principles as the DATAMINE system.

DATAMINE was developed on a simple set of design principles. These evolved from those used for the development of G-EXEC by a team at the British Geological Survey in the 1970s (led by Professor Keith Jeffery, and of which I was one of the central members), and before that by Dr T. Victor Loudon when he developed the Rokdoc package at the University of Reading in the 1960s.

The G-EXEC design principles, adopted for DATAMINE, which remain just as valid today, include the following:
  • GENERALITY - Data Independence: in principle any program in the system can make sense of any data file presented to it (at least sufficiently to determine whether the data are appropriate to the task the program performs). This is achieved through the "self-describing file" which is a table with a minimal metadata header that identifies the file contents. The DATAMINE file has an interesting history. It was modelled essentially on the G-EXEC "G-STAR" file. However, when the system was initially developed on 8-bit micros running the CP/M operating system and a vanilla-flavoured Fortran compiler, the practical constraints made it necessary to pack DATAMINE logical files into large physical files with fixed pre-defined names. This necessitated a fixed page structure for the files, which in turn imposed limits on the numbers of fields which could be used. Within a few years, with porting of DATAMINE to the IBM PC, the hardware and software constraints were eased, and DATAMINE files could be stored as separate physical files - just like the G-EXEC files they had evolved from. However, the page structure was retained, and remains to this day in the .DM file. Another reason for the page structure was to allow the multiple buffering sub-system which was coded in the early DATAMINE to provide reasonably fast and efficient disk reads and writes. Both reasons have long since fallen away. A golden opportunity for defining a new and better file structure would have been with the development of the first "DATAMINE Studio" version in the mid 1990s. It was one of the most important items on my own to-do list when I left the company in 1993.

  • PORTABILITY - to different machines and operating systems: this is achieved by writing as far as possible in a standard machine-independent language. Originally this was standard FORTRAN-66 but as FORTRAN-77 became available it made sense to migrate to this. The machine-dependent coding was kept to an absolute minimum, and was done as far as possible in either a system-specific FORTRAN or, only where absolutely necessary, in C. Today it would be possible to dispense altogether with the C language modules and code the entire system in a more modern FORTRAN 95 or FORTRAN 2003.

  • INTEGRATION: all the programs within the system are engineered to have the same 'look-and-feel', whatever their origin or function.

  • EASE OF USE - Command Level Consistency: Each logical task is a single callable "process program". This defines the "atomic level" at which the user interacts with the system. This is made clear in the G-EXEC documentation. Such a task would be for example, to do a relational join on two files to produce a new file, or to create a kriged block model from a data file. Higher level tasks are carried out through macros which string together these 'atomic' process programs.

  • RELATIONAL DATABASE: The central core of DATAMINE, implemented directly from G-EXEC, is a set of relational database management functionality. This is coded as a collection of process programs which, together with related utility programs, made G-EXEC a powerful general-purpose scientific data processing machine. All data are held in simple tables, in rows and columns: no complicated hierarchical or network-style data structures are allowed within individual files.

  • MODULARITY: The system was designed top-down, and was built bottom-up. This provides a framework for clear modularity in which a subroutine or function is coded once and once only to carry out a particular task. That task is done through only one subroutine. This means that when an update is made or a bug is fixed, the benefits are felt everywhere.

  • ABSENCE OF LIMITATIONS: One of the key design features from the beginning was that DATAMINE should not be constrained by memory or processor capacity.

  • STANDARDISATION OF PROGRAM CONTROLS: In order to run any process program, it is necessary to define some or all of a set of files, fields, and parameters. These are logically distinct types of entity and should never be confounded. This evolved directly from the command structure of G-EXEC.

  • UPWARD AND BACK-COMPATIBILITY: As the system develops, many users either do not want or are unable to spend (waste?) time on updating their tried and tested macros. Therefore, once a process program is released, it should not be withdrawn, because this risks crashing users' work.

It was using these design principles - together with a large amount of pre-existing coding from G-EXEC and elsewhere - that DATAMINE was written within two years from February 1981 start-up to produce a complete released saleable product by 1983, with the bulk of the new coding done by just two people. The system development and the underlying principles are discussed in a series of published and unpublished papers (view and download at www.g-exec.com/papers.asp)

For more information, please email me at