Silicon Dale

Procrustes and the Golden Fleece

In Greek mythology, Procrustes was a sadistic robber who tortured his hapless victims by cutting their legs so that they fit his bed if they were too tall - or stretching them if they were too short. He met his just deserts when captured by Theseus and subjected to similar treatment.

The modern gold mining industry is being fleeced by the Procrustean practice of cutting gold grades when estimating resources.

It has become common practice to replace very high assay grades by lower values before computation of resource models. At great expense drillhole samples are collected and assayed, and then the data for those which should be the most interesting - the samples with the highest grades - are arbitrarily discarded and substituted by numbers which have not come from any laboratory but are merely plucked from the air. These data are cut back either to a constant maximum value, or sometimes to a value derived from the straight-line projection of a log-probability plot of a cumulative frequency curve. Why is this done ? Because the (geostatistical) computational methods used cannot properly handle values which deviate too far from an 'ideal' distribution. If the very high values were not cut back, it is feared that deposit grades would be over-estimated.

There are established techniques in classical statistics, known as trimming and 'winsorization' (named by John Tukey after his colleague and friend the biometrician Charles Winsor) in which outliers are removed or substituted:

  • The trimmed mean is computed after the k smallest and k largest observations are deleted from the sample. In other words, the observations are trimmed at each end.
  • The Winsorized mean is computed after the k smallest observations are replaced by the (k+1)st smallest observation, and the k largest observations are replaced by the (k+1)st largest observation. In other words, the observations are Winsorized at each end. For a symmetric distribution, the symmetrically trimmed or Winsorized mean is an unbiased estimate of the population mean. But the trimmed or Winsorized mean does not have a normal distribution even if the data are from a normal population.
It would seem at first sight that the practice of cutting grades is no more nor less than winsorization. However, there are important differences. First, the value substitution occurs only at one end of the distribution. Second, it seems to be done only when the data distribution is far from symmetric - the result is that bias is necessarily introduced. Third, in classical statistics, there are methods to compute robust estimates of variance and other parameters of trimmed or winsorized data sets. These are not available in the standard geostatistical methods that are normally used.

The problem is, of course, that only high grades are cut because the resource modeller is interested only in the higher grade part of a deposit - above mineable cutoff. Unfortunately, in cutting these so-called 'rogue' values, real information is lost about higher grade parts of the deposit which may contain a significant proportion of the total metal content.

If using linear block kriging, it is known that there is a smoothing effect which would smear high grades into estimates of surrounding blocks. It is understandable that users would wish to minimise such problems - and cutting grades can certainly make them less visible. Lognormal geostatistics is known to be sensitive to small discrepancies from a true lognormal distribution - hence if this method is used it is again understandable that there would be a desire to ensure that the input data set doesn't pose any unwelcome problems. There are, of course, geostatistical methods that have been developed to overcome problems of inconvenient data distributions - the 'multigaussian' method for example. However, these methods are neither simple to understand nor simple to use, and they rely on data transformations which destroy the 'BLUE' (best linear unbiased estimator) properties of kriging estimation. A technique that has been used widely in precious metal deposit modelling is multiple indicator kriging (IK). This technique effectively slices up the data distribution into quantiles and so is relatively robust. There is less need (in principle no need at all) to cut grades when using IK. However, indicator kriging has a number of serious problems, both theoretical and practical, and for that reason is regarded even by many geostatisticians as a technique to be avoided.

One thing which we might learn from IK, though, is very instructive. Semivariograms are computed separately for each indicator cutoff grade, and it is very commonly found that the modelled semivariogram ranges are different for different grades: very high grades have shorter ranges than lower grades. In fact, the variance also will be higher in high-grade zones, though this effect cannot be seen in the 1-0 data sets of IK. This is the real nugget effect (because it could actually be caused by real nuggets). Where grades are very high, short-range variance is also very high (and not necessarily displaying any simple numerical relationship with any local 'mean' grade). Where grades are lower, spatial statistics tend to be more regular, and short-range variance is lower. Effectively different parts of the deposit have different semivariograms. The geostatistician's standard response to this is that homogeneous geological zones should be defined, within which kriging can be used. The reality is that often this cannot be done because there is continuous gradation from low to high-grade areas.

The result is that drillhole assay data sets for precious metal deposits in general are not amenable to the more reputable geostatistical techniques, because there is drift not only in the mean grade but also in all of the semivariogram parameters. Nothing is stationary.

Is there any way around this ? If geostatistics cannot handle precious metal deposit modelling without the user feeling a need to corrupt the data beforehand, is there some other method which can do the job ?

There are two potential candidates which I can see.

The first is to use a statistical method which is not dependent upon the data distribution - in other words, to develop spatial estimation techniques based on nonparametric statistics. I suggested some possible approaches in 1981 (see reference below), but that was also the year I started on development of the Datamine mining software system and neither I nor anyone else has been able to spend much time in pursuing these ideas. Essentially, it is necessary to find a statistical estimation method which is robust with respect to both distribution and drift, and is unaffected by spatial variation in semivariogram properties. There are many avenues worth exploring in this field but their further development would depend on sufficient industrial funding.

The second candidate would be deposit process modelling - simulation of the ore genesis. On a large scale this has been carried out very successfully for predictive exploration by various groups around the world, and this year a new organisation, the Predictive Mineral Discovery Co-operative Research Centre (pmd*CRC), has been established in Australia, as a focus for collaboration among CSIRO, universities, and the mining industry (further information on such work at CSIRO can be found on www.ned.dem.csiro.au/research/structure/ and www.ned.dem.csiro.au/research/solidMech/). On a deposit scale, rather less work has been done. However, it is possible to conceive of approaches which might reward minimal effort. One of these would be perhaps to look at migration of metal from a point or line source, generating a negative exponential spatial distribution. This might be modelled in a real data set very simply by a form of trend surface modelling where the parameters to be estimated are the source location and the decay function parameters. Clearly for such a process simulation approach to become effective for real deposit modelling, close attention would need to be paid to the geological details of the deposit, since simulation could not be done without a thorough understanding of the ore-forming processes.

Geostatisticians pay lip service to the importance of geology in estimation of resources. They quite often take care to separate out more or less homogeneous zones for separate geostatistical modelling. However, in reality, the geology should be the principal controlling factor in estimating resources. Real rocks are not random functions, and usually are not well modelled by random functions.

Recognition of this is the key to the whole question of resource estimation. The geology of the deposit should be the real factor controlling the estimation. Often there is insufficient information available, and relatively crude methods such as natural?neighbour, nonparametric methods ? or even geostatistics ? might be used to try and stretch the available information. From this point of view, clearly the practice of corrupting data by cutting high grades must be deplored and avoided at all costs.

However, where there is some knowledge of the geological setting of the deposit, and particularly where its mode of origin is understood, it should be possible to do modelling in the full sense of the word. A deposit model should ideally take into account the effects of geological processes on the structure and geometry of the deposit and on the localisation of minerals within it.

A geologically based resource modelling approach - unlike geostatistics - will not be a black box. The more mathematical practitioners of geostatistics of course would not be qualified to use such methods, since it requires real understanding of real geology. However, this raises the important but contentious question whether such mathematical geostatisticians should be considered qualified to do resource estimation at all.

References:
S. Henley, 1981: Nonparametric Geostatistics. 145pp. Elsevier Applied Science, London. (now out of print, but a few copies remaining, at $40, available from author)
S. Henley & D.F. Watson, 1998: Possible alternatives to geostatistics: APCOM 1998, IMM, London, p.337-354

Stephen Henley
Matlock, England
steve@silicondale.co.uk
www.SiliconDale.com

Copyright © 2001 Stephen Henley
Procrustes and the golden fleece: Earth Science Computer Applications, v.16, no.9, p.1-3