THE VMINE FILE FORMATS
There is a new file structure developed for use with VMine:
VMDD/VMDA files are used within the system, providing random-access for maximum efficiency of both storage and processing
The VMDD/VMDA format uses paired files, xxxx.VMDD and xxxx.VMDA.
The VMDD file is a sequential ASCII file which contains the data definition. This includes the file name, number of fields and number of records. For each field, it contains the field name, field type (currently A=alpha or N=numeric but notionally extendable to include other types), field length in bytes for alpha fields, and default value for numeric fields.
The VMDA file is a random access binary file containing all data records. The record length is computed from the total number of numeric fields multiplied by 8 (they are stored in Fortran double precision format), the total number of bytes in alpha fields, plus one additional byte for each field to hold the Codd mark - non-blank is an indicator of absent data, with the actual mark used defining the reason for absence, for example 'A' for simply missing, or 'I' for missing and inapplicable. The use of these marks allows implementation of open-world database management.
The VMDD file holds the data definition in a sequential formatted file, as follows:
The VMDA file holds data records in random-access binary format,
containing NUMREC records of fixed length LENTOTAL bytes which is defined as:
Because of the strong typing of modern dialects of Fortran, the old G-EXEC and DATAMINE storage format, with everything held in a REAL array, can no longer be guaranteed to be accepted by Fortran compilers. It is non-standard Fortran. Therefore, in the VMDA file, numeric and character format fields are stored separately. This has the small additional benefit that for character fields with lengths that are not exact multiples of 4 bytes, there is some space saving by comparison with, for example, Datamine files. Indeed, compared with Datamine 'Extended Precision' files there is substantial space saving, as the 8-byte words of such files contain a maximum of only 4 characters.
Each record in the VMDA file contains three buffers, in the following order:
The "Codd mark" is an indicator for any item in a database that the data value is missing or otherwise undefined. It is much more powerful than the NULL of SQL-based systems as it has a well-defined logical basis and, because it can have many alternative codings, it provides a means to treat databases in a more consistent and meaningful way. In his 1990 book, Codd defined two marks, 'A' for simply missing and unknown, and 'I' for missing and inapplicable. Initially these two marks will be implemented in VMine. The 'I' mark is used in situations where a value is undefined as a result of program execution - for example in an outer join where key fields in the two files being joined cannot be matched.