The Domesday Book, compiled in 1086, provided a detailed account of life in England. It was written by hand, on paper, but the several surviving copies are still in good condition and perfectly readable today. In 1986 the BBC produced a new Domesday Book as a permanent record of England 900 years later. This version was written digitally on a Philips laser disc. It can no longer be read, because the technology is obsolete, and none of the special BBC microcomputers required to read it have survived.
Also in 1986, I visited Mufulira Mine in Zambia. Their entire geological database was held on punched cards in the format required by their Dynamic Ore Reserve System (‘DORS'), software written in Fortran for (if I recall correctly) an IBM mainframe computer. The cards occupied a large store room. Do they still survive ? I very much doubt it. Certainly the computer will no longer be there, and the software will have been abandoned long since. Possibly the data will have been rescued in time and transferred to magnetic media - maybe in EBCDIC format on 9-track half-inch magnetic tape. That could still be read - in a few computer centres. However, there are huge volumes of data which will by now be lost forever - unreadable because nobody has thought to update the archival media on which they are stored. Probably few 8-inch floppies survive anywhere, but if they do, who remembers the peculiarities of Tektronix 4051 encoding ? From the early and mid 1990s I have a stack of backup DAT tapes and 1/4-inch tape cartridges and no way now to read any of them. I recently transferred large volumes of data from projects during the 1980s and 1990s on 5-1/4-inch floppies to 100-megabyte zip disks - aware that my only computer with 5-1/4 disk drive is now over 8 years old. Unfortunately these zip disks too are obsolescent, and I should really back them up on to something more permanent.
Therein lies a problem. During the 1960s and 1970s it was feasible to argue that magnetic media could not be trusted and that a long-term archive should be kept on machine readable hardcopy media - punched cards or paper tape. Disk space was too expensive for more than short-term storage, and the integrity of data on magnetic tape could not be guaranteed unless the tapes were kept in an air-conditioned environment and periodically re-tensioned and even copied. With the disappearance of cards and paper tape, and the coming of much cheaper magnetic and optical media, these concerns seem to have vanished. However, the problem is probably worse than ever, as the pace of technological change (advance?) continues unabated. No sooner does a standard become generally accepted than it becomes obsolete. The longest-lived format around today is probably the 3-1/2-inch diskette. However, there are signs that this is disappearing as many new computers are shipped without floppy drives at all. The CD is becoming ubiquitous, and the coming of the CD-RW might be thought to bring a data storage paradise. However, all is not as rosy as it would seem. There are some incompatibilities between the CD-R and CD-RW formats, and both are being challenged by the DVD format. The physical longevity of any of these formats has yet to be demonstrated: the oldest CDs are still no more than 20 years old, and the manufacturers themselves claim no more than 10 to 15 years lifetime. What condition will they be in after 900 years ?
It has been suggested that the Internet will provide the answer to all of these data storage longevity fears. Somehow data can be put on to an Internet server and its future existence will be guaranteed forever. Redundant technologies, RAID storage, distributed archives, all will solve the problem. Well, it is true that technology may exist to help a great deal. But it requires three things:
(1) the servers must be continually updated and there must be error-free transfer of all data stored on them. This is something that is generally the responsibility of the institutional or commercial service provider.
(2) some record must be kept of the format of the data - this should include full documentation of any standards followed (from encoding standards such as EBCDIC or ASCII up to detailed definition of data formats as generated or required by the software originally used on the data sets)
(3) the owner of the data must be aware of the problem and actually take some action to preserve the data sets.
It might be argued that if the owner cannot be bothered to preserve his or her data, then it means the data are probably not worth saving. Yet how many times do you find that you need a particular file or document from a few years ago but either cannot find it or cannot read it ? Another argument is that a data set will commonly have outlived its usefulness once it has been used for the particular industrial or research project for which it was collected - and then should be allowed to expire with its storage media. On this basis, the only lasting record should be the report or published paper. However, there is a definite trend towards electronic publishing which poses the same problem. What may be even more dangerous is the trend in many libraries to reduce their paper collections to microfilm or electronic media - and then to dispose of the original books and periodicals. Already some libraries are finding that their older microfilms are disintegrating after a mere 20 or 30 years. There is real risk that, like the BBC's Domesday disc, scientific knowledge could be lost forever if we rely on unproven and short-lived media.