The description below is new: it was not taken from any Constellation Software Inc copyright material. Even though
I wrote the original version, this description is not taken from the old coding but is
reverse-engineered from actual .DM files generated more recently.
However, it cannot be guaranteed that it correctly describes the .DM format as currently used in products
supplied by Constellation Software Inc. If you use this to read files produced by or write files intended to be read
by Constellation Software Inc products, you do so at your own risk.
However, the file structure described below is used to import Datamine files into VMINE
and the description below should allow development of software which will read and write .DM
files without the need for separate conversion programs.
This is a random-access file with filename extension .DM. It is organised as 'pages' (these are
the Fortran records) with
page length of 2048 bytes (512 4-byte words) into which the data are mapped.
The first page contains the DD (data definition) and the second and following pages contain the data.
In all pages the last 4 words (16 bytes) contain security information, but I think this is no
longer used so can probably (no guarantees!) safely be coded as blanks. However, these words
are not available to be used for data, so the effective page size is actually 2032 bytes (508 4-byte words).
There are two data types, text or alpha ('A') and floating point numeric ('N').
Some integers are used within the first DD page but these are all stored as 4-byte floating-point values.
Data items may either be (a) stored within data records, or (b) file constants whose value is the same
for every record in the file and defined once only in the DD.
Integer items in the Data definition page are stored as Fortran REAL*4 or REAL*8 values in the single and extended precision formats respectively.
First page structure:
|1-8||1-4 and 9-12||File name (max 8 characters) which usually matches the actual file
name (e.g. FILENAME.DM - not case-sensitive)|
|9-16||17-20 and 25-28||Database name (max 8 characters) - I think this is no longer used|
|17-96||33-36,41-44,...,185-188||File description (free text, max 80 characters)|
|97-100||193-200||Numeric date coded as 10000*year + 100*month + day|
|101-104||201-208||Total number of fields in the file (alpha fields are counted as the number of 4-byte blocks they occupy)|
|105-108||209-216||Number of last page in the file|
|109-112||217-224||Number of last logical data record within the last page|
|113-2032||225-4064||Field definitions, each occupying a group of 28 bytes (SP) or 56 bytes (EP).
Alpha fields are always recorded in 4-byte units (in both SP and EP files), and thus foe fields wider than 4 bytes more than one field definition is required,
with the same field name but different values of LENF (1,2,...)
When reading or writing a DD, the position of each field in a logical record is given by the SW value.
|1-8||1-4 and 9-12||Field name (max 8 characters)|
|9-12||17-20||Field type ('A ' or 'N ')|
|13-16||25-32||SW Stored word number - set to zero if the field is a 'file constant' (defined by the 'default' value). This is the storage position within a logical data record.|
|17-20||33-40||Word number within field (always 1 for numeric fields, 1,2,3... for text fields)|
|21-24||41-48||Not used (provision for subsequent inclusion of a code for 'units of measurement' but which was never implemented)|
|25-28||49-56||Default value or file constant value. Default is the value optionally to be used in the event that a data value is missing.|
MAXLEN = Total number of words stored in data fields. These are counted as one for
each floating-point value and one for each 4-byte word of text data.
The length MAXLEN of each logical record is given by the maximum storage-position SW value.
It should be noted that successive words of a text field may not always be contiguous
(i.e. adjacent to each other), but their
positions in each logical data record are given by the storage position value SW for each word of a field,
allowing the field to be reconstructed correctly
even if its constituent 4-byte words are separated. There will always be an SW value for all
integers up to and including MAXLEN.
Fields with storage position = 0 are file constants: they are not stored in the file, and their value for every record is taken from the
default value given in the DD.
|2033-2048||4065-4096||Security information: I think no longer used|
The number of logical data records per page is calculated by NLRP =
INT(508/MAXLEN) - thus in general there will be a few unused bytes at the
end of a page in addition to the 16 bytes reserved for security data.
The structure of data pages is simply
|Words (4-byte for SP, 8-byte for EP)||Content|
|1 to MAXLEN||Data for first data record within page|
|MAXLEN+1 to 2*MAXLEN||Data for second record|
| ... and so on|| |
|Bytes 2033-2048 (SP) or 4065-4096(EP)||Security information: I think no longer used|
As many additional pages are used as needed. The last page is unlikely to be filled with data -
and remaining words are unused and undefined.
When writing a file, data are mapped into each logical record according to the SW (storage word)
values for each
numeric field or for each word of a text field. On writing a file, these logical records are
accumulated in a page buffer
until it is filled (i.e. all NLRP records have been generated) and the whole page is then written
to the file. For the last
page, after generating the last record, the page buffer is written to the file, and the last
page and last record values
are updated on the DD page which is then also written to the file. This is why the file must be
specified as random access.
There are a few special numeric codes which are used within the data.
- -1.0 E30 = 'bottom', used as the missing data code for numeric fields (for text fields, missing data is simply all blanks)
- +1.0 E30 = 'top' and is used if a representation of 'infinity' is needed.
- +1.0 E-30 = 'TR' or 'DL' and is used if it is required to represent an assay value of 'trace' or 'below detection limit'.
All text data is held in REAL variables, not the Fortran CHARACTER type - though the stored format is identical.
This allows use of a simple REAL array to hold a whole page buffer, and another REAL array to hold the whole of each logical
record for writing or reading. This concept originated in the British Geological Survey G-EXEC system in 1972-3 and was the key
to Datamine's generality - rather than needing to pre-define specific data formats for every different combination of text and
numeric fields. The same generality is achieved today through standards such as XML which do not prescribe storage formats or
processing methods. VMINE binary files achieve this generality through a rather different mechanism.