Kriging
From Wikipedia, the free encyclopedia
Kriging is a regression technique used in geostatistics to approximate or interpolate data. The theory of Kriging was developed from the seminal work of its inventor, Danie G. Krige and further developed by Georges Matheron. In the statistical community, it is also known as Gaussian process regression. Kriging is also a reproducing kernel method (like splines and support vector machines).
Contents |
[edit] What is kriging?
Figure: example of one-dimensional data interpolation by Kriging, with confidence intervals
Kriging can be understood as linear prediction or a form of Bayesian inference.[1] Kriging starts with a prior distribution over functions. This prior takes the form of a Gaussian process: N samples from a function will be normally distributed, where the covariance between any two samples is the covariance function (or kernel) of the Gaussian process evaluated at the spatial location of two points. A set of values are then observed, each value associated with a spatial location. Now, a new value can be predicted at any new spatial location, by combining the Gaussian prior with a Gaussian likelihood function for each of the observed values. The resulting posterior distribution is also a Gaussian, with a mean and covariance that can be simply computed from the observed values, their variance, and the kernel matrix derived from the prior. From the geological point of view, Kriging uses prior knowledge about the spatial distribution of a mineral: this prior knowledge encapsulates how minerals co-occur as a function of space. Then, given a series of measurements of mineral concentrations, Kriging can predict mineral concentrations at unobserved points. Kriging is a family of linear least squares estimation algorithms. The end result of Kriging is to obtain the conditional expectation as a best estimate for all unsampled locations in a field and consequently, a minimized error variance at each location. The conditional expectation minimizes the error variance when the optimality criterion is based on least squares residuals. The Kriging estimate is a weighted linear combination of the data. The weights that are assigned to each known datum are determined by solving the Kriging system of linear equations, where the weights are the unknown regression parameters. The optimality criterion used to arrive at the Kriging system, as mentioned above, is a minimization of the error variance in the least-squares sense.
[edit] Controversy
Kriging is the gerund derived from the ubiquitous “krige” eponym conferred by Professor Dr G Matheron on Professor D G Krige for his contribution to the new science of geostatistics. When Krige was plotting distance-weighted average gold grades at the Witwatersrand gold complex in South Africa in the early 1950s, he discovered that two or more gold assays, determined in samples selected at positions with different coordinates in a finite sample space, give an infinite set of distance-weighted average gold grades. Krige, Matheron and his students were unaware in those pioneering days that each distance-weighted average gold grade has its own variance because it is a functionally dependent value of the set of measured values. On the contrary, geostatisticians replaced the true variance of the SINGLE distance-weighted average gold grade with the pseudo variance of a SET of degrees-of-freedom and variance-deprived functionally dependent distance-weighted average gold grades. In fact, one-to-one correspondence between functional dependent values and variances is inviolable in mathematical statistcs but irrelevant in geostatistics.
There is also some controversy whether spatial dependence may be assumed or ought to be verified prior to interpolation by kriging. For example, Clark’s hypothetical uranium datain Practical Geostatistics do not display a significant degree of spatial dependence but the author reports a kriged estimate for some selected coordinates within this sample space anyway. The practice of kriging lends itself to abuse, particularly when applied to a model ore distribution based on the assumption that ore concentrations display a significant degree of spatial dependency in the sample space under examination, which can then be modelled by a Gaussian process.[2] However, some practitioners question the assumption that spatial dependence follows a stochastic process, and that the stochastic process can be correctly estimated from an empirical variogram.[3] Other practitioners recommend using statistical tests to test the assumption of spatial dependency.[4][5][6]. For example, in the figure above, the function fits the graph perfectly, but the primary data set may not display a statistically significant degree of spatial dependence. Failing to pass a test for spatial dependence would indicate that a constant model cannot be distinguished from a kriging model without further information or knowledge. Armstrong and Champigny's A Study on Kriging Small Blocks cautioned against oversmoothing when the authors noticed that kriging variances converge on the zero kriging variance and kriging covariances on the unity kriging covariance as subsets of kriged estimates converge on the infinite set of kriged estimates. Unsurprisingly because kriged estimates are functionally dependent and deprived of degrees of freedom. All the same, the authors appear to suggest that the requirement of functional independence can be violated a little but not a lot.
[edit] Related terms and techniques
A series of related terms were also named after Krige, including kriged estimate, kriged estimator, kriging variance, kriging covariance, zero kriging variance, unity kriging covariance, kriging matrix, kriging method, kriging model, kriging plan, kriging process, kriging system, block kriging, co-kriging, disjunctive kriging, linear kriging, ordinary kriging, point kriging, random kriging, regular grid kriging, simple kriging and universal kriging.
See also: Sampling variogram, variogram (also known as a semivariogram).
[edit] References
- ^ Williams, Christopher K.I. (1998). “Prediction with Gaussian processes: From linear regression to linear prediction and beyond”, M. I. Jordan Learning in graphical models. MIT Press, 599-612.
- ^ Cressie, Noel A.C. (1993). Statistics for Spatial Data. Wiley-Interscience.
- ^ Philip, G. M., Watson, D.F. (1986). "Matheronian Statistics --- Quo vadis?". Mathematical Geology 18 (1): 93-117.
- ^ Fortin, Marie-Josee, Dale, Mark R.T. (2005). Spatial Analysis: A Guide for Ecologists.
- ^ Ullah, Ullah (1998). Handbook of Applied Economic Statistics.
- ^ Schabenberger, Oliver, Pierce, Francis J. (2001). Contemporary Statistical Models for the Plant and Soil Sciences.
[edit] Historical references
- Armstrong, M and Champigny, N, 1988, A Study on Kriging Small Blocks, CIM Bulletin, Vol 82, No 923
- Armstrong, M, 1992, Freedom of Speech? De Geostatisticis, July, No 14
- Champigny, N, 1992, Geostatistics: A tool that works, The Northern Miner, May 18
- Hald, A, 1952, Statistical Theory with Engineering Applications, John Wiley & Sons, New York
- Lipschutz, S, 1968, Theory and Problems of Probability, McCraw-Hill Book Company, New York
- Volk, W, 1980, Applied Statistics for Engineers, Krieger Publishing Company, Huntington, New York
- Youden, W J, 1951, Statistical Methods for Chemists: John Wiley & Sons, New York