Random thoughtsAssumptions are the things we don't know we're making - Douglas Adams
What is randomness ? This thought has been prompted by the recent discussions on the March 2001 Silicon Dale item on geostatistics. Geostatistics - indeed all of statistics - is based on the concept that at least a part of the behaviour of a thing being studied is modelled well by a random process. C.J. Mann (1970) wrote an incisive analysis of randomness in natural phenomena, in which he made it clear that there are two sorts of randomness. There is the complete, pure, unpredictable sort where even in theory it is impossible to forecast behaviour. This is the sort of randomness you have in quantum theory. And there is the sort of pseudo-randomness where unpredictability is due to the complexity of the behaviour. This is what has been studied more recently by mathematicians in chaos theory and complexity theory. Does it matter which we are dealing with ? The simple answer to this is ‘maybe'. Usually it doesn't matter at all. If a thing is unpredictable in practice, then the best model will be statistical. Sedimentology provides some very good examples of this, such as sorting of grains of sand in a river current. However, there are very many geological phenomena which are highly complex but nevertheless follow rules (which may not always be known) that result in behaviour which may not be well modelled by standard statistical models. The formation of many types of mineral deposit can be included in this category. There are well-defined ore-forming processes which can have widely differing results ranging from simple structures to apparently random chaos - depending on the controlling conditions such as pressure, temperature, fluid flow rates, etc.
Mineral deposits are commonly modelled using geostatistical methods. Although there is now a wide selection of these, all can be traced back to, and have their roots in "regionalized variable theory" as developed by Matheron and other mathematicians. This is a tightly specified framework, with a clear set of assumptions which define the limits of its applicability. The random "regionalized variable" has certain properties such as additivity and stationarity (of one sort or another) - and therefore the data to be modelled should also have these properties. Geostatistics may be distinguished from classical statistics in that observations separated in space are not independent but are auto-correlelated - with covariance defined purely by the spatial separation vector between any two observations. The problem is that real data from real deposits do not generally satisfy the assumptions of the original simple geostatistical model. For this reason, progressively more complex geostatistical models have been developed. Although these may allow the data to match more closely the underlying random variable of the statistical model, they also carry their own assumptions and requirements which are not always understood by users of the methods.
Maybe it is the random model which is the basic problem. In this year's annual conference of the Irish Association of Economic Geologists, Dr Isobel Clark presented a talk which showed clearly how ignorance of a real deterministic underlying model can lead the geostatistical practitioner into serious errors in modelling even a simple set of geological data. In a more complex situation - say a series of ore shoots in a gold-bearing vein deposit - there could equally well be significant deterministic components to the localisation of high gold grades: for example, the ore shoots could be controlled by the vein geometry, itself a function of surrounding geology and the directions, timings, and amounts of fault movement. Similarly, the pattern of grades in a roll-front uranium deposit might be controlled deterministically by the rock properties and a set of reaction/diffusion equations. Using a purely statistical method to model such deposits can be quite difficult if the underlying statistical model is itself inappropriate - it is ill-fitted to the data.
What are we to do when faced with a new data set on a deposit where we don't understand the geological processes well enough to define a deterministic model ? The data are there. They may well follow a complex distribution, and they may well appear to violate assumptions required by the statistical model of choice (for example they might be quite clearly non-stationary). One approach might be to use progressively more ‘advanced' geostatistical methods in an attempt to get around these ‘problems'. In my experience, the results of doing this are rarely satisfactory, and the complexity of the estimation method detracts from the confidence which one has in the resource model.
Perhaps it is possible to define geological boundaries which separate the deposit into zones which have more user-friendly statistical properties. As Isobel Clark showed at the Irish meeting, though, the danger of doing this is that boundaries might be purely arbitrary - and false - and separation of the data into different zones could actually make the problem worse. It may be possible to define an underlying process which can be used to subtract a deterministic element from each observation and leave a "random" residual which can then be modelled statistically. But if even this cannot be done, is there any way to obtain a reasonable resource model ? The obvious answer (to a geologist) is to go and look at the rocks and collect more data until you understand the geology. If this is not possible, though, possibly there is an alternative approach which might be worth investigating.
There is a body of statistical methods which have been developed to operate with a minimum of assumptions. This is known broadly as nonparametric statistics. One can work simply from basic assumptions: for example, that the data are measured on a continuous scale (i.e. there is no additivity requirement), and allow that there might be spatial autocorrelation of various extents in different directions and in different locations. No assumption need be made about distribution of values or of error terms. The model is then a different sort of regionalized variable. Estimation methods can be developed from first principles. What you may lose in raw statistical power with such (nonparametric) methods, you gain in applicability - and hence confidence in the results.