 ### Measurement scales and geostatistics

The entire body of geostatistics has been developed on the basis of Matheron's regionalised variable theory (drawing upon previous work by Matern and others). Although I have been unable to find anywhere that it is explicitly stated, it is commonly assumed that the type of variable to which geostatistics applies is a continuous, interval-scale variable (see the scale definitions in Krumbein and Graybill, 1965). Certainly the existence of a valid arithmetic mean is a pre-requisite for linear kriging: for this to be true it is necessary that the addition operator be valid. In other words the data must be ‘additive'.

There are many types of data for which this requirement is not met.

Krumbein and Graybill (1965, p.35-38) identify four measurement scales: nominal, ordinal, interval, and ratio.

• Nominal scales are used for classification and contain no ordering information. Few arithmetic operations can be done on such data - for example, counting the observations in each class. Certainly such data are not suitable subjects for geostatistics. Indicator kriging uses data which have been deliberately degraded to the nominal scale and in my opinion are not an appropriate starting point for geostatistical study.

• Ordinal scales, described above, contain ranking information only. Common statistical parameters such as variances and means cannot be defined for such data. Therefore any data for which the units on the measurement scale are unequally spaced cannot be subjected to geostatistical procedures. One procedure which has been advocated is rank kriging, in which the ranks of data are used for variogram computation and kriging, and the cumulative frequency curve used for back-transformation of the results. Not only does this introduce bias, but in the general case of an arbitrary distribution that bias cannot even be quantified and no correction factor can be estimated.

• The interval scale is used for data which have equal intervals between units. Means and variances can be computed, and geostatistical methods can be used.

• The ratio scale is a special case of the interval scale where there is an absolute zero value - such as in measures of seam thickness or element concentration. As with interval scale data, means and variances and geostatistics can all be used. Computed ratios as described above are also in the ratio scale and for these the common statistical measures must be used only with great care. A problem with many types of ratio scale data (such as element concentrations) is that not only is there an absolute minimum (0%) but also an absolute maximum (100%). The presence of these absolute points will constrain the types of probability distribution which can be used to model the data, and hence will constrain the possible modelling methods.

Count data are normally additive (though by definition not continuous). However, in evaluating diamond deposits, a raw count of stones within samples must be treated with caution. First, the samples themselves should be of equal volume - to give equal ‘support' for all data points. Second, the stones will be of varying sizes and hence of greatly varying value. Statistics on simple stone counts are therefore generally of little practical use. Total carats might be a better variable to use. However, the value of a single 100-carat stone is likely to be very different from that of 100 1-carat stones of similar quality. Value is not a linear function of stone size. Finally, stone value is dependent very much on quality. The only additive variable that can be used to develop valid geostatistical models of diamond deposits is the total dollar value of stones in equal sample volumes.

Data may often be recorded on scales which are highly nonlinear, or indeed may be purely ordinal (i.e. relative ranks are defined but not an absolute numerical value). Examples of irregular or qualitative measurement scales include Mohs' scale of hardness, where the intervals represent unequal differences in hardness, and the 1-5 or 1-10 preference scales beloved of market researchers. These are both typical examples of ordinal scales, where parametric statistics cannot be used, and where means and variances have no meaning.

Other scales, where the significance of an interval is functionally defined, include pH (a logarithmic function of hydrogen ion concentration), and phi (the logarithm of grain size in a sedimentary rock). In such measurement scales the numerical addition operation does not have any meaning. Therefore there is no valid arithmetic mean for anything measured on these scales - and as a direct consequence it is not possible to use geostatistical methods on such data.

Whenever you are dealing with ratios you are courting trouble. The only circumstance in which ratios are additive are when the denominator is constant. Ordinary ratios, such as Mn/Fe or Zn / (Zn+Pb+Cu) , cannot be combined with the simple arithmetic addition operation and you cannot validly use geostatistics on them. A particularly insidious case is when your ratio is an ore grade expressed for example as weight percent or grams per tonne - such as weight-percent of zinc in a high grade ore. The problem in this case is that the model you are trying to generate is almost always a volume-related model (since mining blocks are defined as volumetric units), but the density varies with the grade. In this case (and in principle with all ore-grade modelling) you must first compute the mass of metal per unit volume - to ensure the denominator is the same everywhere: in other words, multiply the grade by the specific gravity, before doing any statistics (let alone geostatistics !).

Angular data are interesting. A unit vector in two dimensions may be expressed as an angle (in degrees or radians) from a reference direction. Clearly the range of possible values is limited (360o is the same angle as 0o) and this means that such data cannot possibly be additive. Even for small ranges of angles there is a problem, in that the vector mean of a set of angles is not in general the arithmetic mean of the angles (it is actually tan-1(Σsin αi/Σcos αi) where αi are the angles to be averaged). In three dimensions the problem is compounded by the fact that the unit vector cannot even be represented by a single number.

In summary, it is essential to be aware of the properties of the data set being used before applying any statistical or geostatistical method to it. Many types of data set which superficially might appear suitable for geostatistical analysis are in fact totally unsuitable, while for others the methods must be applied only with great care and the results interpreted with caution.

Stephen Henley
Matlock, England
steve@silicondale.co.uk
www.SiliconDale.com