William S. Harlan
Date: July 2013
A casual encounter with geostatistics can be baffling because of some non-standard terminology. As it turns out, “Kriging,” the core numerical method of geostatistics, can be derived on a napkin, if you are already familiar with some standard least-squares methods.
If you want to appreciate geostatistical applications, then try the popular book “An introduction to Applied Geostatistics," by Isaaks and Srivastava . If you want a clean mathematical notation, then try “Multivariate Geostatistics: An introduction with Applications," by Wackernagel ;
Kriging is just the least-squares solution of a purely under-determined linear inverse problem. Once you see this equivalence, you can see some simple ways to improve distance-weighted interpolation methods.
Geostatistics poses a useful problem that you are unlikely to have encountered elsewhere in least-squares. The solution looks very familiar, but it has unique elements that make it very useful.
Assume that you have a continuous function of a spatial vector . Usually has two dimensions, but nothing about the derivation limits us to two dimensions.
The actual function is unknown and needs to be reconstructed from a collection of samples at arbitrarily chosen locations.
We will interpolate a value at an unsampled location
as a linearly weighted sum of the known values , at sampled locations
Our problem: how do we optimally estimate weights given what we know about the distribution of ?
We will make a few more assumptions appropriate to a linear least-squares solution.
We will treat all our values as random variables. Without loss of generality (changing variables if necessary), we will assume these variables have zero expected mean:
We will find it convenient to construct a covariance
matrix for our measured values:
Define a covariance vector
for our unknown value relative to the measured values:
Much of geostatistics concentrates on the estimation of these covariances. For now we assume they are known.
Notice that none of our assumptions require that our interpolated value share the same physical units as the values we are interpolating from. Co-kriging differs from Simple Kriging largely by combining values with mixed units. (See Wackernagel for a demonstration of their equivalence.)
Define an estimation error
as the expected
error between the true value and our interpolated value.
Naive weights would simply use the covariance vector between values at the known and the unknown locations. Because the known locations are also correlated with each other, we must remove that effect by multiplying by their inverse covariance .
Once we have our estimated weights
, we can explicitly quantify the error in our estimate.
This will allow us to put Gaussian error bars on each interpolated value.
Substitute our solution (10) into our estimation error (8) for
Here is the most geostatistical assumption of all.
The covariance between values depends only on the vector distance between their sampled positions:
In the absence of other information, it would be appealing to choose a scale-invariant version of a covariance. Scale-invariance allows covariances to be independent of the units chosen for spatial distance. Covariance would similar at any scale, like a fractal.
Unfortunately, this assumption leads to a degenerate solution. Scale-invariance requires covariances to go to infinity at a distance of zero, which should have a finite variance . As we approach infinity, the covariance matrix becomes diagonal like an identity matrix, and so the inverse covariance has no effect on weights.
We see this degeneracy with a popular interpolation known as Shepard's “Inverse Distance Weighting”:
If we are assuming a specific correlation between our measured and interpolated values, then we ought to be able to assume the same correlation between measured values.
Unfortunately, this covariance approaches infinity at a zero distance:
As it turns out, the degeneracy is easily remedied if we sacrifice scale invariance
below the density of our samples. Choose a small distance below which the
correlation does not increase any further.
Let's see how this assumption works with a simple example.
One value is twice as far as from the interpolated point . Specifically and .
If we are using Shepard's interpolation, then we already know enough to estimate the weights as .
However, it should matter very much if the point we are interpolating is between the two measured points, or on the same side of both.
For a non-degenerate covariance, we will assume in our covariance (17).
On opposite sides of the interpolated point, the two measured points are weakly correlated with . We get adjusted weights of , which is very close to the Shepard's interpolation weights.
On the same side of the interpolated point, the two measured points are strongly correlated with . The value is actually better correlated with than it is with the interpolated . We expect the nearer value to override and diminish the contribution of . And sure enough, the adjusted weights are actually , which ignores the second point entirely.
Most geostatistics literature prefers “variograms” to covariances.
This is one of the biggest and most unnecessary obstacles to newcomers.
The two quantities have a simple equivalence:
The geostatistical assumption (13) can be rewritten
The most popular of standard analytic variograms is an exponential:
A user must prepare scatterplots and histograms to scan possible values for the constant , then choose a best fitting value before kriging can begin. For this sort of analysis, variograms have proven more popular than covariances.
It is also possible to recognize Kriging as a special case of a least-squares inverse problem, without postulating the specific weighted solution (2). I thank Dave Hale  for explaining this equivalence to me. Kriging is a purely underdetermined inverse problem, which makes it a bit unusual and less familiar.
We assume that we have a model vector
and a data vector
The data are assumed to be a linear transform (matrix multiplication)
of the model,
plus a vector
Since this objective function is a quadratic in
, we can find an optimum
by setting the derivative to zero:
This alternate solution (30) is less often used because it requires a matrix inversion in the data space rather than in the model space. Most typically, an overdetermined problem has a greater number of data values than model values. In the case of Kriging, however, the situation is reversed.
Others such as Hansen et al  have observed that this alternate solution (30) is equivalent to Kriging. We will elaborate below.
In the case of geostatistical Kriging, we assume that our model is a large dense grid of regularly distributed values. The forward operator merely subsamples that grid for the values . The dimensionality of is possibly orders of magnitude smaller than that of the model . The transform is a rectangular matrix that has many more columns than rows. Each row contains mostly zeros and a single column with a value of 1. Most columns do not have a value of 1 on any row.
Applying the transposed operator to subsampled data merely repopulates the sampled values of and leaves all other values set to 0. This is also the right inverse of the original transform, so that .
If we have sampled the model directly, there is no need to assume any noise at all. If we allow , then the usual estimate (29) becomes degenerate, and . Rather than interpolating any unsampled values, this estimate sets them all to zero.
The alternate form (30) lets us set the noise variance directly to 0.
We can then find the following covariances of our sampled data