Semivariogram methods for modeling
Whittle-Mat´ern priors in Bayesian inverse problems
Richard D Brown1, Johnathan M Bardsley1 and Tiangang Cui2
1 Department of Mathematical Sciences, University of Montana, Missoula, MT
59812, United States
2 School of Mathematics, Monash University, Melbourne, Australia
E-mail: rick.brown@umontana.edu, bardsleyj@mso.umt.edu,
Tiangang.Cui@monash.edu
Abstract.
We present a new technique, based on semivariogram methodology, for
obtaining point estimates for use in prior modeling for solving Bayesian inverse
problems.
This method requires a connection between Gaussian processes with
covariance operators deﬁned by the Mat´ern covariance function and Gaussian processes
with precision (inverse-covariance) operators deﬁned by the Green’s functions of a class
of elliptic stochastic partial diﬀerential equations (SPDEs).
We present a detailed
mathematical description of this connection. We will show that there is an equivalence
between these two Gaussian processes when the domain is inﬁnite – for us, R2 –
which breaks down when the domain is ﬁnite due to the eﬀect of boundary conditions
on Green’s functions of PDEs. We show how this connection can be re-established
using extended domains. We then introduce the semivariogram method for estimating
the Mat´ern covariance hyperparameters, which specify the Gaussian prior needed for
stabilizing the inverse problem. Results are extended from the isotropic case to the
anisotropic case where the correlation length in one direction is larger than another.
Finally, we consider the situation where the correlation length is spatially dependent
rather than constant. We implement each method in two-dimensional image inpainting
test cases to show that it works on practical examples.
Keywords:
inverse problems, variogram, Bayesian methods, boundary conditions,
Whittle-Mat´ern, stochastic partial diﬀerential equations, Gaussian ﬁeld
1. Introduction
Inverse problems are ubiquitous in science and engineering. They are characterized by
the estimation of parameters in a mathematical model from measurements and by a
high-dimensional parameter space that typically results from discretizing a function
deﬁned on a computational domain.
For typical inverse problems, the process of
estimating model parameters from measurements is ill-posed, which motivates the use
of regularization in the deterministic setting and the choice of a prior probability density
in the Bayesian setting. In this paper, we consider linear models of the form
b = Ax + ϵ,
ϵ ∼N(0, λ−1IM),
(1)
arXiv:1811.09446v3  [math.NA]  9 May 2020

Semivariogram methods for inverse problems
2
where b ∈RM is the vector of measurements, A ∈RM×N is the forward model
matrix, x ∈RN is the vector of unknown parameters, and ϵ ∼N(0, λ−1IM) is the
observation noise that follows a zero-mean Gaussian distrubution with covariance matrix
λ−1IM, with IM denoting the M × M identity. In typical inverse problems, Ax is the
discretization of a continuous forward model Ax, where A is a linear operator and x
is a function. The components of the vector x satisfy xi = x(ui), where ui ∈Rd is
the location of the ith element of the numerical grid. The random vector b in (1) has
conditional probability density function
p(b|x, λ) ∝exp

−λ
2∥Ax −b∥2

,
(2)
where ∝denotes proportionality and ∥· ∥denotes the ℓ2-norm.
The maximizer of
p(b|x, λ) with respect to x is known as the maximum likelihood estimator, and we
denote it by xML. As stated above, due to ill-posedness, xML is unstable with respect
to errors in b, i.e., small changes in b result in large relative changes in xML.
There are various methods to stabilize the solution of inverse problems, but they
all involve some form of regularization. In this paper, we take the Bayesian approach
[1], which requires the deﬁnition of a prior probability density function on x. We make
the assumption that the prior is Gaussian of the form x ∼N (0, (δP)−1), which has
probability density function
p(x|δ) ∝exp

−δ
2xTPx

,
(3)
where P is the precision (inverse-covariance) matrix.
Now that we have deﬁned the prior (3) and the likelihood (2), using Bayes’ law, we
multiply them together to obtain the posterior density function
p(x|b, λ, δ) ∝p(b|x, λ)p(x|δ)
∝exp

−λ
2∥Ax −b∥2 −δ
2xTPx

,
(4)
whose maximizer, xλ,δ, is known as the maximum a posteriori (MAP) estimator. The
MAP estimator can be equivalently expressed as
xλ,δ = arg min
x
λ
2∥Ax −b∥2 + δ
2xTPx

.
Our primary focus in this paper is to provide formulations and hyperparameter selection
techniques for prior precision matrices that have an intuitive interpretation and can be
used to solve a wide variety of problems.
1.1. The Mat´ern Class of Covariance Matrices and Whittle-Mat´ern Priors
It remains to deﬁne the prior covariance matrix C = P−1.
The Mat´ern class of
covariance matrices has garnered much praise [2] for its ﬂexibility in capturing many
covariance structures and its allowance of direct control of the degree of correlation in

Semivariogram methods for inverse problems
3
the vector x [3]. The Mat´ern covariance matrix is deﬁned by the Mat´ern covariance
function, which was ﬁrst formulated by Mat´ern in 1947 [4],
C(r) = σ2(r/ℓ)νKν(r/ℓ)
2ν−1Γ(ν)
,
(5)
where r is the separation distance; Kν(·) is the modiﬁed Bessel function of the second
kind of order ν [5]; Γ(·) is the gamma function; ℓ> 0 is the range parameter; ν > 0
is the smoothness parameter; and σ2 is the marginal variance. Omitting σ2 gives the
Mat´ern correlation function. In the isotropic case, when the covariance depends only
on the distance between elements, given the covariance parameters σ2, ν, and ℓ, one
can obtain the covariance matrix C of a vector x = [x1, . . . , xN]T with spatial positions
{uT
1 , . . . , uT
N} ⊂Rd by letting
[C]ij = Cov(xi, xj) = C(∥ui −uj∥),
where C is deﬁned by (5).
The parameters of the Mat´ern covariance function are not as straightforward to
interpret as the parameters of some other covariance functions.
When ν is small
(ν →0+), the spatial process is said to be rough, and when it is large (ν →∞),
the process is smooth [3, 6]. Figure 1 shows how the covariance function behaves with
diﬀerent values of ℓand ν: on the left, ℓ= σ2 = 1 and ν varies, while on the right
ν = σ2 = 1 and ℓvaries. Note that as ν increases, the behavior at small lags changes,
leading to more correlation at smaller distances and a larger practical range, which is
deﬁned to be the distance at which the correlation is equal to 0.05. In Figure 1, this is
the distance at which the covariance function intersects the horizontal line. Meanwhile,
as ℓdecreases, the decay rate of the covariance increases considerably, which decreases
the practical range.
Although ℓis known as the range parameter, the parameter ν
also aﬀects the practical range. In [7], a range approximation ρ = ℓ
√
8ν is used where
C(ρ) ≈0.10.
Despite the beneﬁts of using the Mat´ern class of covariance matrices, its use can
be problematic for inverse problems because computing the precision matrix P, which
is what appears in the posterior (4), requires inverting a dense N × N matrix. Using
the fast Fourier transform (FFT) [8, 9, 10] to operate with P and C more eﬃciently
is recommended if x is deﬁned on a regular grid and periodic boundary conditions are
assumed. In other cases, it is useful that the Mat´ern covariance function has a direct
connection to a class of elliptic SPDEs [7] whose numerical discretization yields sparse
precision matrices, P, that are computationally feasible to work with even when N is
large. Connections of this type were ﬁrst shown to exist by Whittle in [11], where he
showed the connection held for a special case of the Mat´ern covariance class. Hence,
priors that depend on this connection are often referred to as Whittle-Mat´ern priors.
The connection between the general Mat´ern covariance function and SPDEs has been
used in a wide range of applications for deﬁning computationally feasible priors for
high-dimensional problems [12, 13, 14]. Moreover, work has been done in establishing

Semivariogram methods for inverse problems
4
0
1
2
3
4
5
6
7
8
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
1
2
3
4
5
6
7
8
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure 1. Behavior of the Mat´ern covariance function. The smoothness parameter,
ν, primarily aﬀects the covariance at small distances whereas the range parameter, ℓ,
mainly aﬀects the decay rate of the covariance. The horizontal line corresponds to a
covariance value of 0.05 and the practical range is the distance at which the covariance
intersects this line.
convergence theorems for, and lattice approximations of, these Whittle-Mat´ern priors
[15].
The remainder of the paper is organized as follows. In Section 2, we describe in
detail the connection between zero-mean Gaussian processes with the isotropic Mat´ern
covariance operator and those that arise as solutions of a class of elliptic SPDEs. In
Section 3, we show how to estimate the hyperparameters in the isotropic Whittle-Mat´ern
prior using the semivariogram method, and then we show how to use this approach to
deﬁne the prior when solving a Bayesian inverse problem. In Section 4, we extend these
ideas to the anisotropic case and then we consider images with regions that require
diﬀerent covariance structures in Section 5. For each section, we present numerical tests
on two-dimensional image inpainting test cases. We end with conclusions in Section 6.
2. Whittle-Mat´ern Class Priors via SPDEs
In this section, we will show that the Whittle-Mat´ern class of priors can be speciﬁed as
the solution of the SPDE
(1 −ℓ2∆)β/2x(u) = W(u),
u ∈Rd,
β = ν + d/2,
ℓ, ν > 0,
(6)
where ∆= Pd
i=1
∂2
du2
i is the Laplacian operator in d dimensions, and W is spatial
Gaussian white noise with unit variance, which we deﬁne below.
Although this
connection has been shown to exist [11, 7, 13], here we provide a signiﬁcantly more
detailed derivation of this result than we have seen elsewhere. Our derivation is based
on the Green’s function of the diﬀerential operator. For other linear operators with
suﬃcient smoothness, e.g., the one in the Stokes equations and the one in the heat
equation, the corresponding SPDEs can be used to deﬁne diﬀerent Gaussian processes
[16]. The method we employ here provides a potential way to derive the covariance
functions of the Gaussian processes induced by other linear SPDEs as well.

Semivariogram methods for inverse problems
5
2.1. Preliminary Deﬁnitions
Before deriving the solution of (6), we need some preliminary deﬁnitions.
2.1.1. Gaussian Fields
A stochastic process {x(u), u ∈Ω}, with Ω⊂Rd, is a Gaussian
ﬁeld [17] if for any k ≥1 and any locations u1, . . . , uk ∈Ω, [x(u1), . . . , x(uk)]T
is a normally distributed random vector with mean µ = [E[x(u1)], . . . , E[x(uk)]]T,
where E[·] denotes expected value, and covariance matrix [C]ij = Cov(x(ui), x(uj)) =
E[(x(ui) −E[x(ui)])(x(uj) −E[x(uj)])], for 1 ≤i, j ≤k. The covariance function
is deﬁned C(ui, uj) := Cov(x(ui), x(uj)). It is necessary that the covariance function
is positive deﬁnite, i.e., for any {u1, . . . , uk}, with k ≥1, the covariance matrix C
deﬁned above is positive deﬁnite. The Gaussian ﬁeld is called stationary if the mean
is constant and the covariance function satisﬁes C(u, v) = C(u −v) and isotropic if
C(u, v) = C(∥u −v∥).
2.1.2. White Noise
The term white noise [16, 18] comes from light. White light is a
homogeneous mix of wavelengths, as opposed to colored light, which is a heterogeneous
mix of wavelengths. In a similar way, white noise contains a homogeneous mix of all
the diﬀerent basis functions. The mixing of these basis functions is determined by a
random process. When this random process is Gaussian, we have Gaussian white noise.
Consider a domain Ωand let {φj : j = 1, 2, . . .} be an orthonormal basis of L2(Ω) where
L2(Ω) =

f : Ω→R |
R
Ω|f(x)|2dx < ∞
	
. Then Gaussian white noise is deﬁned by
W(u) =
∞
X
j=1
ξjφj(u),
ξj
iid
∼N(0, η2).
(7)
If we are dealing with spatial Gaussian white noise with unit variance, then u refers
to location and η2 = 1.
With this deﬁnition, it is clear that Gaussian white noise
has mean zero:
E[W(u)] = P∞
j=1 E [ξj] φj(u) = 0. Moreover, one can show that
Cov (W(u), W(v)) = η2δf(u −v), where δf(·) is the Dirac delta function [19], also
known as the delta distribution. We include the subscript f to diﬀerentiate the delta
function from the δ hyperparameter used elsewhere in this paper. A well-known and very
important property of the Dirac delta function is that it satisﬁes the sifting property:
f(u) =
R
Rd δf(u −v)f(v)dv.
2.1.3. Green’s Functions
We now consider diﬀerential equations of the form Lx(u) =
f(u), u ∈Rd, where L is a linear, diﬀerential operator. A Green’s function [20, 21], g,
of L is any solution of Lg(u, v) = δf(u −v). Using the Green’s function, the solution
of the equation Lx(u) = f(u) can be written as
x(u) =
Z
Rd g(u, v)f(v)dv.
(8)

Semivariogram methods for inverse problems
6
2.2. The Gaussian Field Solution of the SPDE (6)
In this subsection, we will prove the following theorem concerning the solution of the
SPDE (6).
Theorem 1 The solution x(u) of (6) is a Gaussian ﬁeld with mean zero and Mat´ern
covariance function deﬁned by (5).
Proof. To begin, we note that the Green’s function for (6) is the solution of
(1 −ℓ2∆)β/2g(u, v) = δf(v −u).
(9)
Using (8), the solution to (6) is given by
x(u) =
Z
Rd g(u, v)W(v)dv,
(10)
making x(u) a Gaussian ﬁeld since it is a linear transformation of Gaussian white noise.
We now compute the mean and covariance of the Gaussian ﬁeld, x(u), deﬁned by
(10). Since the Green’s function is a strictly-positive, symmetric, and rapidly decaying
function, we can apply Fubini’s theorem [22] to obtain the mean of x(u):
E[x(u)] = E
Z
Rd g(u, v)W(v)dv

=
Z
Rd g(u, v)E [W(v)] dv = 0.
Since x(u) has mean zero, the covariance is given by
Cov(x(u), x(u′)) = E[x(u)x(u′)]
=
Z
Rd
Z
Rd E[W(v)W(v′)]g(u, v)dv

g(u′, v′)dv′
=
Z
Rd
Z
Rd δf(v −v′)g(u, v)dv

g(u′, v′)dv′
=
Z
Rd g(u, v′)g(u′, v′)dv′.
If we deﬁne C(u, u′) := Cov(x(u), x(u′)), the previous result implies that if L =
(1 −ℓ2∆)β/2, then for our linear L acting only on u′,
LC(u, u′) = L
Z
Rd g(u, v′)g(u′, v′)dv′
=
Z
Rd
h
Lg(u′, v′)
i
g(u, v′)dv′
=
Z
Rd δf(u′ −v′)g(u, v′)dv′
= g(u, u′).
(11)
To derive the Green’s function g in (11), we ﬁrst deﬁne g(u) := g(u, 0). Then (9)
implies
(1 −ℓ2∆)β/2g(u) = δf(u).
(12)

Semivariogram methods for inverse problems
7
To proceed, we must take the Fourier transform [23, 24] of both sides of (12). This
yields
(1 + ℓ2∥ω∥2)β/2ˆg(ω) = 1,
where ω ∈Cd are the coordinates in the Fourier-transformed space and the hat ( ˆf)
notation denotes the Fourier-transform of a function f. Thus, the Fourier transform of
the Green’s function is
ˆg(ω) = (1 + ℓ2∥ω∥2)−β/2.
(13)
Next, we assume stationarity so that the covariance only depends on the relative
locations of the points, i.e., r := u −v. Then E[x(u)x(v)] = E[x(r)x(0)] = C(r, 0) :=
C(r) and (11) can be expressed LC(r) = g(r). If we take the Fourier transform of both
sides of this equation, and appeal to (13), we obtain
ˆC(ω) = (1 + ℓ2∥ω∥2)−β.
Since the Laplacian, ∆, is invariant under rotations and translations, we have radial
symmetry, which is analogous to isotropy in the covariance. Thus we can let s = ∥ω∥
and r = ∥r∥to obtain the equivalent expression
ˆC(s) = (1 + ℓ2s2)−β.
(14)
To transform back to the original (r) space, we use the Hankel transform [25] and its
relationship to the radially symmetric Fourier transform, i.e.,
s
d−2
2 ˆC(s) = (2π)
d
2
Z ∞
0
J d−2
2 (sr)r
d−2
2 C(r)rdr,
where C is the original (untransformed) covariance function and Jν(·) is the Bessel
function of the ﬁrst kind of order ν; see [26, Section 2] for a proof. Using appropriate
substitutions in the inverse Hankel transform and (14), we obtain
C(r) = (2π)−d
2
r
d−1
2
Z ∞
0
J d−2
2 (sr)s
d−1
2 (1 + ℓ2s2)−β(sr)1/2ds.
Finally, using the integral identity [27, Eq. 20, p. 24, vol. II] and some algebra, we
obtain
C(r) =
ℓ−β−d
2rβ−d
2K d
2 −β(r/ℓ)
(2π)
d
22β−1Γ(β)
.
(15)
Using the fact that Kν = K−ν, and deﬁning σ2 := Γ(ν)[ℓd(4π)d/2Γ(ν + d/2)]−1 with
ν := β −d/2, it can be shown that (15) is exactly the Mat´ern covariance function (5).
□
2.3. The Eﬀect of a Finite Domain and Boundary Conditions
The proof of Theorem 1 above assumed that the domain was all of Rd, i.e. Ω= Rd.
However, when solving inverse problems, x(u) is restricted to a ﬁnite domain Ω⊂Rd.
In such cases, boundary conditions that modify the Green’s function must be assumed,

Semivariogram methods for inverse problems
8
and thus the equivalence between the Gaussian ﬁelds deﬁned by the SPDE (6) and those
deﬁned by the Mat´ern covariance function may not hold.
To see this, consider the case where d = 2 and Ω= [0, 1] × [0, 1] with Dirichlet
(zero) boundary conditions, x(0, t) = x(1, t) = x(s, 0) = x(s, 1) = 0, where 0 ≤s, t ≤1.
Additionally, we assume ν = 1 so that the exponent of the diﬀerential operator is equal
to one, making the discretization straightforward. In this case, (6) simpliﬁes to
(1 −ℓ2∆)x(u) = W(u),
u ∈R2,
ℓ> 0.
Using a uniform mesh on [0, 1] × [0, 1] with a step size of h = 1/n, so that N = n2,
yields the numerical discretization
(IN + (ℓ/h)2L2D)x = δ−1/2ξ,
ξ ∼N(0, In),
where δ is the scaling parameter for the prior and (1/h2)L2Dx is the standard ﬁnite-
diﬀerence discretization of (−∂2x(u)/∂u2
1 −∂2x(u)/∂u2
2) [10].
Then the probability
density for x is given by
x|δ, ℓ∼N
 0, δ−1(I + (ℓ/h)2L2D)−2
,
or equivalently,
p(x|δ, ℓ) ∝exp

−δ
2xT(I + (ℓ/h)2L2D)2x

.
(16)
When discretizing the SPDE, there is a scaling factor needed that guarantees that the
variance scales systematically with respect to the change of the length-scaling parameter,
ℓ. The exact form of this scaling factor is unimportant for our purposes since we are
ultimately only interested in a regularization parameter, α, as will be seen in Section
2.4. To keep notation simpler, we use δ as a placeholder for this term. This is also the
reason we are interested in whether the Mat´ern correlation rather than the covariance
is preserved when restricting our Gaussian ﬁeld to a ﬁnite domain.
We now let n = 50, so N = 502 = 2500, and generate 50 000 samples from (16)
for each of N xi values, calculate the empirical correlation between the samples, and
compare this with the theoretical correlation deﬁned by the Mat´ern covariance function.
We do this for ℓ= 1/4 and plot the results in the middle of Figure 2, together with the
Mat´ern correlation map on the left. It is clear that there is a disconnection between the
empirical correlation and the Mat´ern correlation.
It is crucial that the connection between the Gaussian ﬁelds deﬁned by the SPDE
and those deﬁned by the Mat´ern covariance function holds because then the parameters
in the SPDE can be estimated using the semivariogram method described in Section 3.
Fortunately, we can restore this connection by extending the computational domain. In
two dimensions, we deﬁne Ω= [1 −a, a] × [1 −a, a], for a > 1, e.g., if a = 1.5 then
Ω= [−0.5, 1.5] × [−0.5, 1.5]. We then generate realizations for ((2a −1)n)2 = (2n)2 =
10 000 xi values on the extended domain and compute the empirical correlation only for
the xi values that correspond to the original domain, Ω= [0, 1] × [0, 1]. The results are
plotted on the right side of Figure 2, where it is clear that the empirical correlation map
is nearly indistinguishable from those obtained using the Mat´ern correlation function.

Semivariogram methods for inverse problems
9
Figure 2. Isotropic correlation maps. Plots of the Mat´ern correlation map (left),
the empirical correlation map with n = 50 computed on the domain Ω= [0, 1] × [0, 1]
(middle), and the empirical correlation map computed on the domain Ω= [−0.5, 1.5]×
[−0.5, 1.5] (right), computed from random draws from the prior (16) in 2D with ν = 1
and ℓ= 1/4.
To determine the a value that extends the domain far enough to restore the
Mat´ern/SPDE connection, but not so far as to introduce unnecessary computational
cost, we look to the Mat´ern correlation function itself. We want to extend the domain
far enough so that all x values in [0, 1] × [0, 1] have a suﬃciently low correlation with
the x values at the end of the extended domain. The criterion we used to determine
if the connection was restored was based on relative error: ∥ρ −ρa∥F/∥ρ∥F < 0.05,
where ρ is the true Mat´ern correlation matrix, ρa is the approximate correlation matrix
obtained by discretizing the SPDE, and ∥· ∥F denotes the Frobenius norm.
In tests, it was found that we should always extend the domain at least slightly. If
we let rc be the distance for which the Mat´ern correlation is approximately equal to c,
then our tests showed that setting a = 1 + r0.30 restores the connection to the Mat´ern
covariance for ν ≥1/2 when using zero boundary conditions and setting a = 1 + r0.20
restores the connection to the Mat´ern covariance for ν ≥1/2 when periodic boundary
conditions are used. For ν = 1 and ℓ= 1/4, a should be set to 1.5 in the Dirichlet
boundary condition case, which gives a relative error in the diﬀerence of the correlation
matrices of 0.0375, and it should be set to 1.6 when using periodic boundary conditions.
We note that since ℓis directly related to the degree of correlation in the prior, the
extension necessary to preserve the connection rises sharply as ℓincreases. It is rare in
practice, however, to have ℓ≥1/4 when ν ≥1 since that implies the correlation persists
across the entire region. Thus, it is uncommon to have to extend beyond a domain of
[−0.5, 1.5] × [−0.5, 1.5].
For the above discussion, we focused on zero boundary conditions. Similar results
hold if periodic boundary conditions are assumed, in which case L, and thus L2D, can
be diagonalized by the FFT, assuming x is deﬁned on a regular grid. The FFT-based
diagonalization of L2D can be exploited to greatly reduce computational cost, thus when
extending the domain in two-dimensions, it is advantageous to use periodic boundary
conditions and the extended domain Ω= [−0.5, 1.5] × [−0.5, 1.5] so that L2D deﬁned
on Ωcan be diagonalized by the FFT. A more thorough description of the eﬀects of

Semivariogram methods for inverse problems
10
boundary artifacts with diﬀerent boundary conditions can be found in [28].
Finally, in our numerical experiment above, we chose a speciﬁc value of ν, but
other values of ν can be chosen. The general form of the isotropic prior density in two
dimensions, with ν included as a hyperparameter, is
p(x|δ, ν, ℓ) ∝exp

−δ
2xT(I + (ℓ/h)2L2D)ν+d/2x

.
(17)
If ν +d/2 is a non-integer, a fractional power of I+(ℓ/h)2L2D must be computed, which
is possible, generally speaking, if we have a diagonalization of I + (ℓ/h)2L2D in hand,
but the resulting precision matrix is typically full and dense. Such a diagonalization is
typically computable in one-dimensional examples, even with dense matrices. In two
dimensions, however, an eﬃcient diagonalization is possible only if periodic boundary
conditions are assumed. We will restrict the exponent ν + d/2 to be an integer in this
paper to preserve the sparsity in the precision matrix, which will be especially useful in
Section 5.
2.4. Computing MAP Estimators for Whittle-Mat´ern Priors
Using Bayes’ law, we multiply the prior (17) by the likelihood (2) to obtain the posterior
density function
p(x|b, λ, δ, ν, ℓ) ∝p(b|x, λ)p(x|δ, ν, ℓ)
∝exp

−λ
2∥Ax −b∥2 −δ
2xT(I + (ℓ/h)2L2D)ν+d/2x

.
The maximizer of p(x|b, λ, δ, ν, ℓ) is known as the MAP estimator, and it can be
computed by solving
xα = arg min
x
1
2∥Ax −b∥2 + α
2 xT(I + (ℓ/h)2L2D)ν+d/2x

=
 ATA + α(I + (ℓ/h)2L2D)ν+d/2−1 ATb,
(18)
where α = δ/λ. Assuming we know ℓand ν, α can be estimated using one of many
regularization parameter selection methods (see, e.g.,[29, 30, 10]). One such method is
generalized cross validation (GCV):
α = arg min
η>0









A

ATA + ηP
−1
ATb −b

2
tr

I −A

ATA + ηP
−1
AT










(19)
for P = (I + (ℓ/h)2L2D)ν+d/2.
In practice, ν is often ﬁxed [31, 12] and ℓis either estimated manually or by
using the fully Bayesian approach, which involves Markov chain Monte Carlo (MCMC)
[32] sampling.
This requires setting up hyperprior distributions and can be time
consuming, subjective and unintuitive, so we present a new method for selecting these
hyperparameters next.

Semivariogram methods for inverse problems
11
3. The Semivariogram Method for Estimating ν and ℓ
In the inverse problem formulation above, the components of the vector x correspond
to values of an unknown function x at numerical mesh points within a spatial region
Ω. This motivates using methods from spatial statistics to estimate the Whittle-Mat´ern
prior hyperparameters ν and ℓ. One such method uses a variogram, and a corresponding
semivariogram [33], which requires the assumption of intrinsic stationarity, i.e., that the
elements of x have constant mean and the variance of the diﬀerence between the elements
is constant throughout the region. This is a weaker assumption than is required by many
other parameter estimation tools, which is one of the reasons variograms have become
popular in spatial statistical applications [34], and it is the reason we use semivariograms
here. Although the use of semivariograms for estimating parameters to determine a
covariance structure is commonly used in spatial statistics, this is, to our knowledge,
the ﬁrst time these tools have been used to estimate prior hyperparameters for use in
inverse problems.
The semivariogram is deﬁned by γ(r) = 1
2Var[Z(ui) −Z(uj)], where r = ui −uj
and {Z(u) : u ∈Ω⊂Rd} is a spatial process. Due to our stationarity assumption,
Var[Z(ui)] = Var[Z(uj)] = σ2, which we use to derive the following alternative
expression for γ(r):
γ(r) = 1
2

Var[Z(ui)] + Var[Z(uj)] −2Cov[Z(ui), Z(uj)]

= σ2 −Cov[Z(ui), Z(uj)].
Thus, the semivariogram simpliﬁes to the diﬀerence between the variance in the
region and the covariance between two points with a diﬀerence r. The variogram is
formally deﬁned as 2γ(r), hence the terms variogram and semivariogram are often
used interchangeably.
To remain consistent, we will continue to refer to γ(r) as a
semivariogram throughout the paper.
We now need a way to estimate the semivariogram from given data. For this, we
use what is known as the sample, or empirical, semivariogram. Assuming that Z(u)
is isotropic, so that r = ∥r∥= ∥ui −uj∥, then the empirical semivariogram can be
expressed
ˆγ(r) =
1
2n(r)
X
(i,j)|∥ui−uj∥=r
[z(ui) −z(uj)]2,
(20)
where z(u) is a realization of Z(u), and n(r) is the number of points that are separated
by a distance r. The ˆγ(r) values are often referred to as the semivariance values. In
a typical semivariogram, the semivariance values increase as r increases since points
tend to be less similar the further apart they are, which increases the variance of their
diﬀerences.
Although the empirical semivariogram is useful in obtaining semivariance values
from data, it is not ideal for modeling data for various reasons (see [34] for details),
thus it is typical to ﬁt a semivariogram model to the empirical semivariogram. Since

Semivariogram methods for inverse problems
12
our prior distribution for x has a Mat´ern covariance, we will use the theoretical Mat´ern
semivariogram model [4, 2] given by
γ(r, θ) =





0
if r = 0
a0 + (σ2 −a0)

1 −
1
2ν−1Γ(ν)(r/ℓ)νKν(r/ℓ)

if r > 0
(21)
where a0 ≥0 is the nugget, σ2 ≥a0 is the sill, and θ = (a0, σ2, ν, ℓ). The nugget is
the term given to the semivariance value at a distance just greater than zero and the
sill is the total variance contribution or the semivariance value where the model levels
out.
The sill, σ2, is also the variance parameter in the Mat´ern covariance function
(5). We can estimate a0, σ2, ν, and ℓby ﬁtting semivariogram models to the empirical
semivariogram.
There are a number of ways to ﬁt the semivariogram model to the empirical
semivariogram.
We use weighted least squares, as is commonly done [34], choosing
the θ that minimizes
W(θ) =
X
r
n(r)
2[γ(r, θ)]2[ˆγ(r) −γ(r, θ)]2.
(22)
To minimize W(θ), we adapt the MATLAB codes from [35, 36].
More speciﬁcally,
we adapt [35] for computing the empirical semivariance ˆγ(r) and we adapt [36] for
minimizing W(θ). Although it is possible to optimize both ν and ℓcontinuously, we
will require ν + d/2 to be an integer.
Weighted least squares, in general, performs
well when ﬁnding optimal estimates for a0, σ2, and ℓfor given empirical semivariogram
values when ν is ﬁxed, but not when ν is also free to vary (most software requires a ﬁxed
ν value). To combat this issue, and to ensure ν + d/2 is an integer, we cycle through
various ﬁxed values of ν to obtain estimates for the other parameters and their weighted
least squares value. We then choose the θ with the smallest W(θ).
For an illustration, we generated a random ﬁeld, shown on the left side of Figure 3,
and ﬁt a semivariogram to the ﬁeld. The optimized parameters of the model are ν = 2
and ℓ= 0.019, which corresponds to a practical range of 0.102. Thus, the values of the
ﬁeld are nearly independent a tenth of the way across the region. The sill and nugget
are estimated to be σ2 = 1.003 and a0 = 0.206, respectively. A plot of the resulting
ﬁtted Mat´ern semivariogram model is given on the right side of Figure 3.
The values of ν and ℓfrom θ = (a0, σ2, ν, ℓ) obtained by ﬁtting the Mat´ern
semivariogram model to a spatial ﬁeld, as described in the previous paragraph, can
be used to deﬁne the Whittle-Mat´ern prior (17). The sill, σ2, and the nugget, a0, are
not especially useful outside of ﬁtting the semivariogram model because they do not
correspond to any hyperparameter in (17). They are helpful only in determining the
best estimates for ν and ℓ. Any contribution these parameters may have made to the
prior distribution will be accounted for in the regularization parameter, α. Therefore,
after ﬁtting the semivariogram models, σ2 and a0 are discarded.
With estimates for ν and ℓin hand, the MAP estimator, xα, can then be computed
as in Section 2.4, from which we can recompute θ by ﬁtting the Mat´ern semivariogram

Semivariogram methods for inverse problems
13
0
0.1
0.2
0.3
0.4
0.5
0.6
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure 3. Semivariogram. A randomly generated spatial ﬁeld is shown on the left and
the empirical semivariogram, along with the Mat´ern model ﬁt, is given on the right.
The ﬁtted hyperparameters are ν = 2 and ℓ= 0.019, which corresponds to a practical
range of 0.102.
Algorithm 1 The semivariogram method for MAP estimation with Whittle-Mat´ern
prior.
0. Estimate θ = (a0, σ2, ν, ℓ) by ﬁtting a Mat´ern semivariogram model to b.
1. Deﬁne the prior (17) using ν and ℓ, compute α using (19), and compute xα using
(18).
2. Update θ = (a0, σ2, ν, ℓ) by ﬁtting a Mat´ern semivariogram model to xα.
3. Return to step 1 and repeat until ν and ℓstabilize.
model to the empirical semivariogram values of xα. Repeating this process iteratively
yields Algorithm 1. Recall that b is a vector of measurements, which will usually be
noisy or have some missing values, and each element of b has a corresponding spatial
position. Since ν is being optimized discretely to ensure that β = ν + d/2 is an integer,
convergence will be met when νj−νj−1 = 0 where νj is the ν value ﬁt in the jth iteration.
Then ℓis said to have converged when |ℓj −ℓj−1|/ℓj−1 < ε with ε determined by the
user. In this paper, we will consider ℓto have converged when the relative diﬀerence is
less than 0.01, which usually takes fewer than three iterations to achieve.
The semivariogram method is essentially a parametric empirical Bayes method
[37] for point estimation.
We have a distributional assumption on x, but no prior
distributions are assumed for ν or ℓ. The hyperparameters are instead estimated by
iteratively ﬁtting semivariograms to the data.
3.1. Numerical Experiments
We now implement the semivariogram method on a two-dimensional deblurring and
inpainting example. Recall that the connection between the Mat´ern covariance and
the Whittle-Mat´ern prior depends on a stationarity assumption, which the following
example may not exhibit.
For simplicity, we will still assume stationarity and

Semivariogram methods for inverse problems
14
acknowledge that future work should be done in the case when no stationarity is
present. Additionally, the numerical examples given in this paper all use color images.
In our analysis, we will assume independence in the color bands and obtain priors and
reconstructions for each one individually.
3.1.1.
Results In this example, we assume periodic boundary conditions on the
extended domain, but due to the restriction from the extended domain Ωto Ω, circulant
structure is lost in the forward model matrix, and hence, linear system solves must
be done using an iterative method. As in [10, Section 3.1.3], we use preconditioned
conjugate gradient (PCG) iteration, both for computing α and for computing xα. We
attempt to deblur and demask a 128 × 128 image of Main Hall on the University of
Montana (UM) campus. To do this, we begin with a 256 × 256 image, given in Figure
4, and then restrict to the center 128 × 128 image. This smaller image in the middle
will be thought of as being on a domain Ω= [0, 1] × [0, 1] and the larger, full image will
then be deﬁned on Ω= [−0.5, 1.5] × [−0.5, 1.5].
To obtain b, we ﬁrst perform a slight blurring operation on the full 256×256 true
image plotted in Figure 4. Since this is a color image, the deblurring process is done
individually for the red, green, and blue intensity arrays. We then restrict to the central
128 × 128 pixels (with boundaries denoted in Figure 4) and randomly remove 40% of
the pixels to obtain the masked, and moderately blurry image on the left in Figure 5.
We seek an estimate of x in the same central subregion.
Omnidirectional
semivariograms with 25 approximately equally spaced grid points in 0 < r <
√
2/10
are used.
We chose
√
2/10 as a cutoﬀbecause it balances the need to capture the
covariance structure at short distances, which are well-known to be the most important
[34], with those at longer distances. When ﬁtting semivariograms to the masked image,
the removed entries will not be considered or else the correlation would be strongly
Figure 4.
Full 256 × 256 image of Main Hall at the University of Montana with
128 × 128 subimage.

Semivariogram methods for inverse problems
15
20
40
60
80
100
120
120
100
80
60
40
20
20
40
60
80
100
120
120
100
80
60
40
20
20
40
60
80
100
120
120
100
80
60
40
20
Figure 5. Two-dimensional image deblurring test case. On the left is a plot of the
blurred, masked, and noisy data; in the middle is a plot of the Tikhonov solution; and
on the right is a plot of the solution obtained using the Whittle-Mat´ern prior with ν = 1
and ℓ= 0.0364, 0.0313 and 0.0543 for red, green and blue intensities, respectively.
inﬂuenced by those entries.
The semivariogram method is used to obtain ν = 1 for each color band, ℓ=
0.0364, 0.0313 and 0.0543 for the red, green, and blue intensities, respectively, and
α = 0.0023, 0.0018 and 5.47 × 10−5. Convergence was met in two iterations for each
color intensity. We also computed the Tikhonov solution, as deﬁned in [10, Section
3.1.3], for which the prior covariance is equal to a scalar multiple of the identity matrix.
The Tikhonov α values for all three color bands were around 0.0004. Note that for
both of these reconstructions, the regularization parameter, α, was optimized using
the highest correlation between the solution and the true image rather than chosen by
GCV to ensure that any diﬀerences in the solutions is due to the method and not a
poorly-chosen regularization parameter.
The two solutions are plotted in Figure 5. It is clear that the solution that used the
Whittle-Mat´ern prior is the superior reconstruction. The correlation between xα and
x, the true image, is 0.950. While the Tikhonov solution is able to remove the blur, it
performs inpainting poorly since each pixel value is assumed independent of one another
due to the identity covariance matrix.
3.2. Discussion
Compared to the fully Bayesian method, the semivariogram procedure has some key
advantages. This technique produces competitive solutions and clearer interpretations
of the hyperparameters ν and ℓ, and it can inform how far to extend the domain to
maintain a connection with the Mat´ern covariance. Additionally, the computation time
is only a fraction of what is needed to compute an adequate number of MCMC samples.
In our implementation of the example above, the semivariogram method was more
than 20 times faster than the fully Bayesian MCMC method. Finally, it is not trivial
to sample from a complex model such as this one without signiﬁcant autocorrelation,
whereas sampling is not needed for the semivariogram method.
The primary disadvantage is that we lose uncertainty quantiﬁcation. We also have

Semivariogram methods for inverse problems
16
to calculate α, the regularization parameter, using other techniques like GCV. One other
shortcoming to the semivariogram method, as described in this section, is the fact that
it requires the ﬁeld or image to be isotropic. In the next section, we extend these results
to anisotropic ﬁelds.
4. Geometric Anisotropy
The solution to (6) is an isotropic Gaussian ﬁeld, which means the correlation length is
the same in every direction. This isotropy assumption is often not satisﬁed and so it will
be useful to have an alternate SPDE formulation for the case when correlation lengths
diﬀer with direction. This is known as geometric anisotropy [34]. The groundwork for
constructing priors that can model anisotropy has been laid in works such as [13, 7, 38].
4.1. Anisotropic SPDE
We will derive an anisotropic SPDE that can be used in similar way that (6) was used in
the prior modeling in the isotropic case. We will only consider the two-dimensional case,
but results can be extended to d > 1 dimensions. In two dimensions, for a Gaussian
ﬁeld with correlation length ℓ1 in the direction of the angle θ, where −π/2 < θ ≤π/2 is
measured counter-clockwise from the x-axis, and correlation length ℓ2 in the direction
perpendicular to θ, we can make the following change of variables from isotropic to
anisotropic coordinates:
w =
"
cos θ
−ℓ2/ℓ1 sin θ
sin θ
ℓ2/ℓ1 cos θ
# "
u1
u2
#
(23)
and thus
w1(u1, u2) = cos θu1 −ℓ2/ℓ1 sin θu2
w2(u1, u2) = sin θu1 + ℓ2/ℓ1 cos θu2.
We will apply the change of variables (23) to both sides of (6) to obtain the
analogous anisotropic SPDE. The Laplacian on the left-hand side can be altered using
the chain rule:
∂
∂u1
=
∂
∂w1
∂w1
∂u1
+
∂
∂w2
∂w2
∂u1
and
∂
∂u2
=
∂
∂w1
∂w1
∂u2
+
∂
∂w2
∂w2
∂u2
,
which means
∂2
∂u2
1
=
 ∂2
∂w2
1
∂w1
∂u1
+
∂2
∂w1∂w2
∂w2
∂u1
 ∂w1
∂u1
+

∂2
∂w1∂w2
∂w1
∂u1
+ ∂2
∂w2
2
∂w2
∂u1
 ∂w2
∂u1
= cos2 θ ∂2
∂w2
1
+ 2 sin θ cos θ
∂2
∂w1∂w2
+ sin2 θ ∂2
∂w2
2

Semivariogram methods for inverse problems
17
and
∂2
∂u2
2
=
 ∂2
∂w2
1
∂w1
∂u2
+
∂2
∂w2∂w1
∂w2
∂u2
 ∂w1
∂u2
+

∂2
∂w1∂w2
∂w1
∂u2
+ ∂2
∂w2
2
∂w2
∂u2
 ∂w2
∂u2
= (ℓ2/ℓ1)2 sin2 θ ∂2
∂w2
1
−2(ℓ2/ℓ1)2 sin θ cos θ
∂2
∂w1∂w2
+ (ℓ2/ℓ1)2 cos2 θ ∂2
∂w2
2
.
The right hand side of (6) is updated by changing the coordinates of the white
noise. The inverse transformation of (23) is
u =
"
cos θ
sin θ
−τ sin θ
τ cos θ
# "
w1
w2
#
:= f(w),
(24)
where τ = ℓ1/ℓ2. Now, we deﬁne the transformed white noise basis functions as
˜φj(w) = ψj(f(w))| det(Jf(w))|1/2 = ψj(f(w))(ℓ1/ℓ2)1/2,
where det(Jf(w)) denotes the determinant of the Jacobian of the transformation f(w),
which is (ℓ1/ℓ2)1/2 in our case. This will preserve the orthonormal properties of the basis
functions. Then, appealing to (7),
W(w) =
∞
X
j=1
ξj ˜φj(w),
ξj
iid
∼N(0, 1)
=
∞
X
j=1
ξjφj(u)(ℓ1/ℓ2)1/2 = (ℓ1/ℓ2)1/2W(u),
which means W(u) = (ℓ2/ℓ1)1/2W(w).
So, taking ℓ= ℓ1 and making the appropriate substitutions, (6) is converted to the
anisotropic SPDE:

1 −
h
(a2
θ + b2
θ) ∂2
∂w2
1 + (c2
θ + d2
θ) ∂2
∂w2
2 −2(aθcθ −bθdθ)
∂2
∂w1∂w2
iβ/2
x(w) = (ℓ2/ℓ1)1/2W(w)
where aθ = ℓ2 sin θ, bθ = ℓ1 cos θ, cθ = ℓ2 cos θ, and dθ = ℓ1 sin θ. For
R =
"
ℓ1 cos θ
ℓ1 sin θ
−ℓ2 sin θ
ℓ2 cos θ
#
,
the above SPDE can be written
 1 −∇· RTR∇
β/2 x(w) = (ℓ2/ℓ1)1/2W(w).
(25)
Notice that if ℓ1 = ℓ2, this SPDE is equivalent to (6) with ℓ= ℓ1.
4.2. The Gaussian Field Solution of the SPDE (25)
Like in the isotropic case, we are interested in the the properties of the solution of (25),
especially its covariance function. First, we deﬁne the anisotropic Mat´ern covariance
function [39] as
C(rw) = σ2(rw/ζ)νKν(rw/ζ)
2ν−1Γ(ν)
, with ζ =
ℓ1
p
cos2(ψ −θ) + (ℓ1/ℓ2)2 sin2(ψ −θ)
,
(26)

Semivariogram methods for inverse problems
18
where rw = ∥wi −wj∥is the distance between the anisotropic coordinates, ζ is the new
range parameter in the direction of ψ, ℓ1 is the correlation length in the direction of
θ and ℓ2 is the correlation length in the direction perpendicular to θ. Notice that the
smoothness parameter, ν, is unaﬀected.
The remainder of this subsection contains results used to prove the following
theorem.
Theorem 2 The solution x(w) of (25) is a Gaussian ﬁeld with mean zero and
anisotropic Mat´ern covariance function deﬁned by (26).
Proof. First, we derive the Green’s function for (25), which is the solution of
 1 −∇· RTR∇
β/2 g(w, v) = δf(v −w).
(27)
Using (8), the solution to (25) is given by
x(w) = (ℓ2/ℓ1)1/2
Z
R2 g(w, v)W(v)dv,
(28)
which makes x(w) a Gaussian ﬁeld since it is a linear transformation of Gaussian white
noise. Be aware that we are still assuming stationarity in our ﬁeld. To derive the Green’s
function g in (28), we ﬁrst deﬁne g(w) := g(w, 0). Then (27) implies
 1 −∇· RTR∇
β/2 g(w) = δf(w).
(29)
We would like to change from the anisotropic coordinates w to anisotropic
coordinates u in (29) so we can use the results from Section 2.2. We again use (24)
for the coordinate change and, in a similar fashion as was done earlier, we apply
the chain rule to replace ∂2/∂w2
1, ∂2/∂w2
2, and ∂2/(∂w1∂w2) in
 1 −∇· RTR∇
β/2
with partial derivatives in terms of u.
When making this change, the coeﬃcients
of ∂2/∂u2
1, ∂2/∂u2
2, and ∂2/(∂u1∂u2) are ℓ2
1, ℓ2
1, and 0, respectively and so we have
(1 −∇· RTR∇)g(w) = (1 −ℓ2
1∆)g(u). Additionally, we can change variables in the
Delta function on the right side of (29) by multiplying by the determinant of the Jacobian
of (24): ℓ1/ℓ2. Thus, the change of variables transforms (29) into the equation
 1 −ℓ2
1∆
β/2 g(u) = (ℓ1/ℓ2)δf(u),
(30)
which is equivalent to (12) up to a constant. Hence, we can apply the results of Section
2.2.
Namely, after changing variables, the solution of (25) is a Gaussian ﬁeld with
mean zero and the isotropic Mat´ern covariance function deﬁned by (5). Notice that
the constant that multiplies the Delta function on the right-hand side of (30) and the
constant that multiplies the integral in (28) will cancel when going through the process
of deriving the covariance function since the constant in (28) gets squared.
We must now make one ﬁnal change of variables back to w from u so our covariance
function will be in terms of the anisotropic coordinates rather than the isotropic
ones. Since the input to the Mat´ern correlation function must be a distance between
isotropic spatial locations, we need to represent an isotropic distance, ru, in terms of the

Semivariogram methods for inverse problems
19
anisotropic coordinates. Consider r := wi−wj. Then, deﬁning rw := ∥r∥= ∥wi−wj∥,
ru := ∥ui −uj∥=

"
cos θ
sin θ
−τ sin θ
τ cos θ
#
(wi −wj)
 =

"
cos θ
sin θ
−τ sin θ
τ cos θ
#
r

=

"
cos θ
sin θ
−τ sin θ
τ cos θ
# "
r1
r2
# =

"
r1 cos θ + r2 sin θ
−r1τ sin θ + r2τ cos θ
# .
Now we convert to polar coordinates with r1 = rw cos ψ and r2 = rw sin ψ. Then
ru =

"
rw cos ψ cos θ + rw sin ψ sin θ
−rwτ cos ψ sin θ + rwτ sin ψ cos θ
# =

"
rw cos(ψ −θ)
rwτ sin(ψ −θ)
#
= rw

cos2(ψ −θ) + τ 2 sin2(ψ −θ)
1/2 .
Therefore, we need to adjust the distance between the vectors wi and wj by [cos2(ψ−θ)+
τ 2 sin2(ψ−θ)]1/2 in order to get the distances to plug into the isotropic Mat´ern correlation
function. Thus, the isotropic Mat´ern covariance function has been generalized to the
anisotropic case using the same change of variables as in (23). Adjusting the anisotropic
distances is equivalent to deﬁning the anisotropic Mat´ern covariance function as we have
in (26).
□
4.3. Anisotropic Prior Modeling
To obtain a sparse representation of the precision matrix for the anisotropic Mat´ern
covariance, we can discretize (25) using the standard ﬁnite-diﬀerence approximations
with appropriate boundary conditions. Taking a step size of h = 1/n on a uniform
mesh, so that N = n2 in two dimensions, yields
h
I + 1
h2(a2
θ + b2
θ)(L ⊗I) + 1
h2(c2
θ + d2
θ)(I ⊗L)
−
2
4h2(aθcθ −bθdθ)(K ⊗K)
iβ/2
x = δ−1/2ξ,
ξ ∼N(0, IN).
where ⊗denotes Kronecker product [10]. Note that the constant multiplying the white
noise term gets absorbed into the δ hyperparameter.
In the zero boundary condition case,
L =


2
−1
0
. . .
0
−1
2
−1
. . .
0
0
−1
2
...
...
...
...
...
...
−1
0
0
. . .
−1
2


n×n
and
K =


0
1
0
. . .
0
−1
0
1
. . .
0
0
−1
0
...
...
...
...
...
...
1
0
0
. . .
−1
0


n×n
,
and when using periodic boundary conditions, we let L(1, n) = L(n, 1) = K(1, n) = −1
and K(n, 1) = 1. Then
x ∼N
 0, δ−1P−1
(31)

Semivariogram methods for inverse problems
20
where
P =

I + 1
h2(a2
θ + b2
θ)(L ⊗I) + 1
h2(c2
θ + d2
θ)(I ⊗L) −
2
4h2(aθcθ −bθdθ)(K ⊗K)
β
. (32)
In order to retain sparsity in P, we will again require that β = ν + d/2 be an integer.
Additionally, like in the isotropic case, an extension of the computational domain is
required to maintain a connection between (32) and (26).
Now that we have a prior covariance matrix that maintains a connection to the
anisotropic Mat´ern covariance, we return to the MAP estimator, which can be computed
by solving
xα = arg min
x
1
2∥Ax −b∥2 + α
2 xTPx

=
 ATA + αP
−1 ATb,
(33)
where α = δ/λ and P is as in (32).
4.4. Directional Semivariograms
When ﬁtting semivariograms to a spatial ﬁeld, intrinsic stationarity and isotropy is
assumed.
In our case, we are still assuming intrinsic stationarity, but our ﬁeld is
anisotropic. Thus, a change must be made to our ﬁeld before ﬁtting a semivariogram to
obtain an estimate for ℓ1 and ℓ2. We again use (24), the inverse of the change used in
(23). Using the same argument that was used when transforming the Green’s function
PDE from anisotropic coordinates in (29) to isotropic coordinates in (30), it is not
diﬃcult to show that (25) is transformed to
(1 −ℓ2
1∆)(ν+d/2)/2x(u) = W(u),
which is equivalent to (6) with ℓ= ℓ1.
We can apply this same change of variables (24) to any two-dimensional spatial
ﬁeld that exhibits geometric anisotropy to achieve isotropy. For example, if we begin
with a spatial ﬁeld that exhibits its larger correlation length in the 45◦direction with
τ = ℓ1/ℓ2 = 3, the change of variables will rotate the ﬁeld so the direction of maximum
correlation length is in the 0◦direction and will then stretch the ﬁeld along the new
y-axis to remove the geometric anisotropy and create a new, isotropic ﬁeld. This is
shown in the middle in Figure 6. Once the spatial ﬁeld has been adjusted in this way,
a semivariogram can be ﬁt to the transformed ﬁeld as in the usual, isotropic case.
In order to adjust the spatial ﬁeld to satisfy the isotropy assumptions in the way
described above, we must ascertain θ, the direction of maximum correlation length
measured from the x-axis, and τ, the ratio of the correlation length in the direction of θ
to the correlation length in the direction orthogonal to θ. Both of these parameters can
be estimated using directional empirical semivariograms. Directional semivariograms
are ﬁt in a similar way as omnidirectional semivariograms in (20), but instead of taking
all points separated by a distance r, we restrict the pairs of points to a certain angle, ψ.
If we think of wi and wj as vectors, then ψ is equivalent to the angle between wi −wj

Semivariogram methods for inverse problems
21
0
0.2
0.4
0.6
0
0.5
1
 = 90;  range = 0.08
0
0.2
0.4
0.6
0
0.5
1
 = 75;  range = 0.08
0
0.2
0.4
0.6
0
0.5
1
 = 60;  range = 0.14
0
0.2
0.4
0.6
0
0.5
1
 = 45;  range = 0.17
0
0.2
0.4
0.6
0
0.5
1
 = 30;  range = 0.14
0
0.2
0.4
0.6
0
0.5
1
 = 15;  range = 0.08
0
0.2
0.4
0.6
0
0.5
1
 = 0;  range = 0.08
0
0.2
0.4
0.6
0
0.5
1
 = -15;  range = 0.06
0
0.2
0.4
0.6
0
0.5
1
 = -30;  range = 0.06
0
0.2
0.4
0.6
0
0.5
1
 = -45;  range = 0.06
0
0.2
0.4
0.6
0
0.5
1
 = -60;  range = 0.06
0
0.2
0.4
0.6
0
0.5
1
 = -75;  range = 0.06
Original Field
Rotated Field
Rotated and Scaled Field
0
0.2
0.4
0.6
0
0.5
1
 = 90;  range = 0.17
0
0.2
0.4
0.6
0
0.5
1
 = 75;  range = 0.17
0
0.2
0.4
0.6
0
0.5
1
 = 60;  range = 0.17
0
0.2
0.4
0.6
0
0.5
1
 = 45;  range = 0.17
0
0.2
0.4
0.6
0
0.5
1
 = 30;  range = 0.17
0
0.2
0.4
0.6
0
0.5
1
 = 15;  range = 0.17
0
0.2
0.4
0.6
0
0.5
1
 = 0;  range = 0.17
0
0.2
0.4
0.6
0
0.5
1
 = -15;  range = 0.17
0
0.2
0.4
0.6
0
0.5
1
 = -30;  range = 0.17
0
0.2
0.4
0.6
0
0.5
1
 = -45;  range = 0.17
0
0.2
0.4
0.6
0
0.5
1
 = -60;  range = 0.17
0
0.2
0.4
0.6
0
0.5
1
 = -75;  range = 0.17
Figure 6.
Directional semivariograms.
The directional semivariograms for the
original, anisotropic ﬁeld are shown on the left.
For each of the 12 images,
semivariogram value is plotted against lag distances.
The direction of maximum
correlation is determined to be 45◦with a ratio of 3 since the distance required to
pass γcrit = 0.9 was largest in that direction and that distance is 3 times greater than
the distance needed in the −45◦direction. The directional semivariograms for the
rotated and scaled ﬁeld are shown on the right with a ratio of 1.
and the x-axis. For example, if ψ = 0, we restrict to all pairs of locations wi and wj
on the same horizontal line, i.e., with the same y-coordinate. Formally, the empirical
directional semivariogram can be deﬁned as
ˆγψ(r) =
1
2n(r, ψ)
X
(i,j)|∥wi−wj∥=r ,φij=ψ
[z(wi) −z(wj)]2,
(34)
where φij denotes the angle between wi −wj and the x-axis, and n(r, ψ) is the number
of points that are separated by a distance r with angle of separation equal to ψ. It
is common to calculate a directional semivariogram for −90◦< ψ ≤90◦in steps of
either 15◦or 30◦. We take a step size of 15◦here, which will result in 12 directional
semivariograms.
Once the directional semivariograms have been calculated for each of the 12
diﬀerent ψ angles, we ﬁt a common scatterplot smoother, the loess curve [40], to the
semivariogram values in each direction to achieve continuous curves. Then, to determine
the ratio of correlation lengths, we can select a constant γcrit value between the nugget
and sill and observe the distance required for the loess curve to surpass the height of
γcrit. The direction of maximum correlation, θ, will require a larger distance to reach
γcrit than other directions since the variance of the diﬀerences between values in that
direction is expected to be smaller. The anisotropy ratio, τ, can then be computed as
the ratio between the distance in the direction of θ and the distance in the direction
perpendicular to θ.
This process is illustrated in Figure 6. The directional semivariograms are shown
for the original, anisotropic ﬁeld on the left. We can see that the correlation length is
largest in the 45◦direction since the distance of 0.1684 that it takes for the curve to
pass γcrit = 0.9 is the largest of any direction. The range distance in the −45◦direction
is 0.0561 and so the ratio of those ranges is τ = 0.1684/0.0561 = 3.

Semivariogram methods for inverse problems
22
We can then rotate the ﬁeld clockwise by 45◦and stretch it in the direction of the
new y-axis by a factor of τ = 3 to achieve an isotropic ﬁeld, as is done in the middle
of Figure 6. The directional semivariograms for the new ﬁeld are shown on the right in
Figure 6. It now takes a distance of 0.1684 for the variogram values to pass γcrit for each
ψ angle, which means the ratio has been reduced to one, as it should be for an isotropic
ﬁeld. It is not always the case that we can reduce the ratio of these range values down to
one, but we can reduce it enough for the ﬁeld to be considered approximately isotropic.
Once we have obtained θ and τ and have changed the coordinates of the ﬁeld, we
can ﬁt an isotropic omnidirectional semivariogram to estimate ν and ℓ1. Then we let
ℓ2 = ℓ1/τ. All hyperparameters for use in (31) will have been estimated and we can
update these estimates iteratively using Algorithm 2. The convergence criteria for these
hyperparameters are as follows: θj −θj−1 = 0, νj −νj−1 = 0, |ℓj
1 −ℓj−1
1
|/ℓj−1
1
< 0.01, and
|ℓj
2 −ℓj−1
2
|/ℓj−1
2
< 0.01 where θj, νj, ℓj
1 and ℓj
2 denotes the jth iteration of the respective
hyperparameter.
4.5. Numerical Experiments
We will illustrate the semivariogram method in the anisotropic case with a two-
dimensional inpainting example.
The original image, given on the left in Figure 7,
shows a rock formation in Northern Arizona known as the Wave [41] where the layers
of sandstone strata are clearly visible. We selected a subsection in the lower-middle of
the image, shown in the middle of Figure 7, to illustrate our method. This will be the
true image. We then added some noise and masked 60% of the image. This is shown
on the right in Figure 7.
Like we saw in Section 3.1.1, the prior will play a large role in the inpainting
process since much of the image is missing. We will directly compare the solution using
the anisotropic Whittle-Mat´ern prior to the solution using the isotropic Whittle-Mat´ern
prior, both of which will have hyperparameters determined using semivariograms. Like
before, the regularization parameter, α, will be optimized using the highest correlation
between the solution and the true image.
Algorithm 2 The Semivariogram Method for MAP Estimation with Anisotropic
Whittle-Mat´ern Prior.
0. Set xα = b.
1. Estimate θ and τ by computing directional semivariograms for xα.
2.
Transform the anisotropic spatial ﬁeld coordinates, w, to isotropic spatial ﬁeld
coordinates, u, using (24).
3. Estimate θ = (a0, σ2, ν, ℓ1) by ﬁtting an isotropic Mat´ern semivariogram model to
the transformed ﬁeld. Then compute ℓ2 = ℓ1/τ.
4. Deﬁne the prior precision matrix, P, by (32) using ν, ℓ1, ℓ2, and θ, compute α using
(19), and compute xα using (33).
5. Return to step 1 and repeat until θ, τ, ν, ℓ1, and ℓ2 stabilize.

Semivariogram methods for inverse problems
23
20
40
60
80
100
120
120
100
80
60
40
20
20
40
60
80
100
120
120
100
80
60
40
20
Figure 7. Inpainting example. The original image showing the rock layers of the
Wave in northern Arizona is given on the left. The true image used in the inpainting
example is given in the middle. The masked image is given on the right.
20
40
60
80
100
120
120
100
80
60
40
20
20
40
60
80
100
120
120
100
80
60
40
20
20
40
60
80
100
120
120
100
80
60
40
20
Figure 8. Inpainting solutions. The true image (left) is given along with the the
isotropic solution (middle) and the anisotropic solution (right).
After calculating the directional semivariograms for the image, the direction of
maximum correlation was determined to be −75◦for each color intensity. For the blue
color-band, the correlation length in that direction was ℓ1 = 0.1517 and the correlation
in the 15◦direction was ℓ2 = 0.0101, which gives a ratio of τ = 15. ν was determined
to be 1 and all of these hyperparameters converged in at most four iterations for each
color and the initial θ estimate of −75◦given in the ﬁrst iteration remained unchanged
throughout the process. When ﬁtting an omnidirectional semivariogram to the masked
image for the isotropic case, ν = 2 and ℓ= 0.0142.
The reconstructions are given in Figure 8. With the isotropic solution, the masking
is removed, but since the prior assigns a very small correlation between each pixel, the
reconstruction is noticeably spotty. The anisotropic solution, however, does a good job
of removing the masking completely. The reconstruction is a bit smoother than the true
image, but the original sandstone layers can be seen nicely.
Some statistics of the reconstructions are given in Table 1. Although the isotropic
solution is still competitive, the anisotropic prior gives the reconstruction that most
closely aligns with the true image. The isotropic solution has a mean absolute error
(MAE) more than 57% larger and a mean squared error (MSE) more than 180% higher

Semivariogram methods for inverse problems
24
Table 1. Statistics for inpainting MAP estimates.
True Image
Isotropic Covariance
Anisotropic Covariance
¯x
0.530
0.530
0.530
s
0.207
0.202
0.206
Min
0.000
−0.053
−0.065
Q1
0.357
0.364
0.360
Median
0.522
0.520
0.520
Q3
0.678
0.676
0.678
Max
1.000
1.112
1.093
ρxα,xtrue
0.944
0.981
Residual MAE
0.045
0.029
Residual MSE
0.005
0.002
than those respective measures in the anisotropic case. The anisotropic reconstruction
does fall short with the minimum value, however, which is farther from the truth than
the solution given by the isotropic prior.
4.6. Discussion
Although the reconstruction with the anisotropic prior covariance matrix is better here,
there are still some improvements that can be made. This example had a constant angle
of maximum correlation length throughout the image and the ratio between maximum
and minimum correlation was rather high, that is, greater than ﬁve. If either of these
features fail to hold, the anisotropic prior often produces a reconstruction that performs
slightly worse or oﬀers no beneﬁt over using an isotropic prior. We focus on the case
when the angle of maximum anisotropy is not constant in the next section.
5. Regional Anisotropy
We have a way to deﬁne priors for isotropic and anisotropic spatial ﬁelds as long as that
covariance structure is consistent for the entire ﬁeld. In the case where the correlation
length and angle of maximum anisotropy change throughout the image, we will want to
model each of these regions with a diﬀerent prior.
5.1. Regional Precision Matrix
Suppose we have k diﬀerent regions in our image, each of which has a diﬀerent covariance
structure. We deﬁne Di, i = 1, . . . , k, as a masking matrix such that the only non-zero
elements of Dix are those in region i. We will not allow for overlapping regions so that
Pk
i=1 Dk = I, the identity matrix. Now, to establish a prior for x in this regional case,
we take Cov(x) = Cov(D1x + . . . + Dkx) = Cov(D1x) + . . . + Cov(Dkx), since each
region is assumed independent due to not having any elements of x in common. Deﬁne
the best Whittle-Mat´ern covariance structure, as chosen by a semivariogram, for region

Semivariogram methods for inverse problems
25
i as Ci with corresponding precision matrix Pi = C−1
i . Then Cov(Dix) := DiCiDi.
Thus, the prior for x in this regional case has pdf
p(x|δ) ∝exp

−δ
2xT(D1C1D1 + . . . + DkCkDk)−1x

,
(35)
which means our precision matrix is given by P = (D1C1D1 + . . . + DkCkDk)−1. Note
that (35) reduces to (3) with P = P1 when k = 1. In general, the Ci matrices and P
are dense, so actually constructing this precision matrix is infeasible for large problems.
Additionally, FFTs cannot be used since D1C1D1 + . . . + DkCkDk is not circulant
even if each Ci is. Thus, we seek an alternative expression such that the matrix-vector
multiplication Px is achievable.
Without loss of generality, let k = 2. Let
C1 =
"
C1A
C1B
C1C
C1D
#
= P−1
1
=
"
P1A
P1B
P1C
P1D
#−1
and
C2 =
"
C2A
C2B
C2C
C2D
#
= P−1
2
=
"
P2A
P2B
P2C
P2D
#−1
.
Also assume that the regions are deﬁned in a way that divides the region vertically (an
assumption we will drop later) so that
C = Cov(x) = Cov(D1x + D2x) = Cov(D1x) + Cov(D2x) =
"
C1A
0
0
C2D
#
,
which means our precision matrix is
P = C−1 =
"
C−1
1A
0
0
C−1
2D
#
.
Using the block matrix inversion identity, it can be shown that C−1
1A = P1A −
P1BP −1
1DP1C and C−1
2D = P2D −P2CP −1
2A P2B and thus
P = C−1 =
"
C−1
1A
0
0
C−1
2D
#
=
"
P1A −P1BP −1
1DP1C
0
0
P2D −P2CP −1
2A P2B
#
,
which can be equivalently written as
P = D1P1D1 −D1P1(D2P1D2)†P1D1 + D2P2D2 −D2P2(D1P2D1)†P2D2.
In general, for k > 2,
P = C−1 = (D1C1D1 + D2C2D2 + . . . + DkCkDk)−1
=
k
X
i=1

DiPiDi −DiPi
h
(IN −Di)Pi(IN −Di)
i†
PiDi

,
(36)
which, since each Pi is sparse, involves only sparse matrices. It is straightforward to
show (36) holds even in the case where the regions do not divide the region vertically
by performing a reordering of the indices of x.

Semivariogram methods for inverse problems
26
Since we have an expression for P, we can now discuss how to perform the
multiplication Px. This will be needed to perform an iterative inverse method such as
conjugate gradient to obtain the MAP estimator. Since each Di and Pi is sparse, each
matrix vector multiplication in (36) is eﬃcient except the ones involving pseudoinverses.
We can, however, take advantage of the lower-rank structure of [(IN −Di)Pi(IN −Di)]†,
which has rank N −ri where ri the rank of Di. Let Pi,nz be the square matrix that
consists of all rows and columns of (IN −Di)Pi(IN −Di) that have any nonzero elements.
That is, keep row and column j of (IN −Di)Pi(IN −Di) if [IN −Di]j,j = 1. Then let
Ri = chol(Pi,nz) such that Pi,nz = RT
i Ri where chol denotes the Cholesky factorization
and Ri is upper triangular. The Cholesky decomposition is known to be eﬃcient for
sparse, symmetric, positive deﬁnite matrices such as Pi,nz [42]. Then we can perform
the multiplication of DiPi[(IN −Di)Pi(IN −Di)]†PiDix in the following way:
1. Multiply yi = Pi(Dix).
2. Extract the N −ri elements of yi that correspond to the nonzero diagonal elements
of IN −Di: yi(ind).
3. Deﬁne a variable zi as an N × 1 vector of zeros.
4. Multiply by

(IN −Di)Pi(IN −Di)
†
by taking zi(ind) = Ri\(RT
i \yi(ind)).
5. Complete the multiplication Di(Pizi).
6. Repeat for 1 ≤i ≤k, so Px =
k
X
i=1
Di(Pizi).
Step 4 is the most costly since it requires both a forward and a backward
substitution. This can be performed more eﬃciently for large regions since the rank of
(IN −Di)Pi(IN −Di) is inversely related to the size of region i. Also, sparse reorderings,
such as the symmetric approximate minimum degree permutation, can be used so Ri has
fewer nonzero entries. The multiplication of Px must be performed for each iteration of
CG, but each Ri can be stored ahead of time so the Cholesky decompositions need only
be performed once. We saw some improvements in the performance of the CG algorithm
when a preconditioner was used. The total number of iterations was approximately 21%
lower, which corresponded to about a 15% overall time saving.
5.2. Numerical Experiments
We now consider an example where the angle of maximum anisotropy changes
throughout the image. We take the central portion of the Wave image from Figure
7 and again mask it so that 60% of the image is blank. Then we attempt to inpaint the
image using an isotropic prior, an anisotropic prior, and a regional anisotropic prior.
The results are shown in Figure 9. The top-right image shows the masked picture as
well as how the regions were chosen. The ﬁrst region is shown with the red overlay
while the second region is the remainder of the image. Semivariograms were ﬁt to both
regions and the top region was given a prior with an angle of maximum anisotropy of

Semivariogram methods for inverse problems
27
20
40
60
80
100
120
120
100
80
60
40
20
20
40
60
80
100
120
120
100
80
60
40
20
20
40
60
80
100
120
120
100
80
60
40
20
20
40
60
80
100
120
120
100
80
60
40
20
Figure 9. Inpainting solutions. The true image (top-left) is given along with the
masked image (top-right), the isotropic solution (bottom-left) and the anisotropic
solution (bottom-middle), and the regional solution (bottom-right).
Table 2. Statistics for regional inpainting MAP estimates.
True Image
Isotropic Covariance
Anisotropic Covariance
Regional Covariance
¯x
0.567
0.565
0.566
0.566
s
0.207
0.200
0.202
0.207
Min
0.000
−0.014
−0.082
−0.032
Q1
0.400
0.402
0.402
0.398
Median
0.565
0.564
0.564
0.562
Q3
0.722
0.717
0.718
0.720
Max
1.000
1.085
1.077
1.160
ρxα,xtrue
0.954
0.954
0.969
Residual MAE
0.042
0.041
0.035
Residual MSE
0.004
0.004
0.003
−30◦while θ = −75◦for the bottom region. In the anisotropic solution given in the
bottom-middle of the ﬁgure, θ = −75◦throughout the image. Qualitatively, the regional
solution in the bottom-right of the ﬁgure looks best.
Turning to Table 2, we can see the statistics comparing the diﬀerent reconstructions.
The isotropic and anisotropic solutions were similar in terms of the correlation and mean
errors, but the regional solution is better in both of those categories and is similar in
the others.

Semivariogram methods for inverse problems
28
5.3. Discussion
The regional covariance solution performed better in this example, but it does have
some shortcomings. Firstly, it is best used when the distinction between regions is high.
This is because the transition between regions when using this prior is abrupt, rather
than smooth. Smoothing the transition between regions is something we leave to future
work. Additionally, since multiplying P by x requires inverting a matrix, this method
can be slow when that matrix is large, which corresponds to a small region. Therefore,
we suggest using small regions only when necessary. Alternatively, it is possible to solve
a diﬀerent inverse problem for each region independently and then combine the results.
This will allow FFTs to be used since the precision matrix for each inverse problem will
be in the form of (32).
6. Conclusion
In this paper, we introduced a method for selecting hyperparameters for use in the
prior distribution of x based on semivariogram modeling.
We think of the noisy
data as a spatial ﬁeld and ﬁt semivariograms to the noisy data and then iteratively
to the MAP estimates to obtain point estimates for the prior hyperparameters. This
method relies on the fact that the solution of the SPDE (6) is a Gaussian process with
zero mean and Mat´ern covariance operator, which we have shown in detal. However,
this connection requires an inﬁnite domain, for us R2. For a ﬁnite domain, which is
typically required for computations, the connection is broken, i.e., the SPDE solution
is a zero mean Gaussian process without a Mat´ern covariance operator. Fortunately,
the connection can be restored by extending the ﬁnite computational domain.
We
showed how to systematically choose the extended domain using the Mat´ern parameters.
The semivariogram method has the beneﬁts of giving point estimates with a more
intuitive interpretation while providing an objective way to choose an extension of the
computational domain that is adequate for restoring the SPDE/Mat´ern connection.
We then applied the semivariogram method to an isotropic inpainting and deblurring
example in two dimensions.
We generalized the isotropic results to the anisotropic case and showed the
semivariogram method can be applied as well by using directional semivariograms and
the anisotropic SPDE (25). An inpainting example comparing reconstructions using
isotropic and anisotropic priors was presented.
Finally, we discussed an even more
general case when the image has regions with diﬀering correlation lengths and angles of
maximum correlation, which requires a sparse precision matrix that can be obtained via
a discretized SPDE for each region. One more example was shown that yielded good
solutions.

Semivariogram methods for inverse problems
29
Acknowledgments
J. Bardsley acknowledges support from the Gordon Preston Fellowship oﬀered by the
School of Mathematics at Monash University. T. Cui acknowledges support from the
Australian Research Council, under grant number CE140100049 (ACEMS). We would
also like to acknowledge the assistance of Dr. Jon Graham at the University of Montana
with the semivariogram methodology.
References
[1] Jari Kaipio and Erkki Somersalo. Statistical and Computational Methods for Inverse Problems.
Springer, 2005.
[2] Michael L Stein.
Interpolation of Spatial Data: Some Theory for Kriging.
Springer Science &
Business Media, 2012.
[3] Peter Guttorp and Tilmann Gneiting. Studies in the history of probability and statistics XLIX
On the Mat´ern correlation family. Biometrika, 93(4):989–995, 12 2006.
[4] Bertil Mat´ern. Spatial Variation, volume 36. Springer Science & Business Media, 2013.
[5] Larry C Andrews. Special Functions of Mathematics for Engineers. McGraw-Hill New York, 1992.
[6] Budiman Minasny and Alex B McBratney.
The Mat´ern function as a general model for soil
variograms. Geoderma, 128(3-4):192–207, 2005.
[7] Finn Lindgren, H˚avard Rue, and Johan Lindstr¨om. An explicit link between Gaussian ﬁelds and
Gaussian Markov random ﬁelds: the stochastic partial diﬀerential equation approach. Journal
of the Royal Statistical Society: Series B (Statistical Methodology), 73(4):423–498, 2011.
[8] Andrew TA Wood and Grace Chan. Simulation of Stationary Gaussian Processes in [0, 1]d. Journal
of Computational and Graphical Statistics, 3(4):409–432, 1994.
[9] Claude R Dietrich and Garry N Newsam.
Fast and exact simulation of stationary Gaussian
processes through circulant embedding of the covariance matrix. SIAM Journal on Scientiﬁc
Computing, 18(4):1088–1107, 1997.
[10] Johnathan M Bardsley. Computational Uncertainty Quantiﬁcation for Inverse Problems. SIAM,
2018.
[11] Peter Whittle. On Stationary Processes in the Plane. Biometrika, 41(3/4):434–449, 1954.
[12] Lassi Roininen, Mark Girolami, Sari Lasanen, and Markku Markkanen. Hyperpriors for Mat´ern
ﬁelds with applications in Bayesian inversion. Inverse Problems & Imaging, 13, 12 2016.
[13] Lassi Roininen, Janne MJ Huttunen, and Sari Lasanen.
Whittle-Mat´ern priors for Bayesian
statistical inversion with applications in electrical impedance tomography.
Inverse Problems
& Imaging, 8(2):561–586, 2014.
[14] Karla Monterrubio-G´omez, Lassi Roininen, Sara Wade, Theo Damoulas, and Mark Girolami.
Posterior Inference for Sparse Hierarchical Non-stationary Models. 04 2018.
[15] Lassi Roininen, Petteri Piiroinen, Markku Lehtinen, et al.
Constructing continuous stationary
covariances as limits of the second-order stochastic diﬀerence equations.
Inverse Problems &
Imaging, 7(2):611–647, 2013.
[16] Gabriel J Lord, Catherine E Powell, and Tony Shardlow.
An Introduction to Computational
Stochastic PDEs. Number 50 in Cambridge texts in applied mathematics. Cambridge University
Press, 2014.
[17] Havard Rue and Leonhard Held.
Gaussian Markov Random Fields: Theory and Applications.
CRC press, 2005.
[18] John B Walsh.
An introduction to stochastic partial diﬀerential equations.
In ´Ecole d’´Et´e de
Probabilit´es de Saint Flour XIV - 1984, pages 265–439. Springer Berlin Heidelberg, 1986.
[19] Sadri Hassani. Dirac Delta Function. In Mathematical Methods, pages 289–319. Springer, 2000.

Semivariogram methods for inverse problems
30
[20] Mark S Gockenbach. Partial Diﬀerential Equations: Analytical and Numerical Methods, volume
122. SIAM, 2005.
[21] Ivar Stakgold and Michael J Holst. Green’s Functions and Boundary Value Problems, volume 99.
John Wiley & Sons, 2011.
[22] Stanis law Saks. Theory of the integral. Hafner Publishing Company, 1937.
[23] Ian Naismith Sneddon. Fourier Transforms. Courier Corporation, 1995.
[24] Mateusz Kwa´snicki.
Ten equivalent deﬁnitions of the fractional Laplace operator.
Fractional
Calculus and Applied Analysis, 20(1):7–51, 2017.
[25] Robert Piessens.
The Hankel Transform.
In Alexander D Poularikas, editor, Transforms and
Applications Handbook, chapter 9. CRC Press, Boca Raton, FL, 2000.
[26] Loukas Grafakos and Gerald Teschl. On Fourier transforms of radial functions and distributions.
Journal of Fourier Analysis and Applications, 19(1):167–179, 2013.
[27] Harry Bateman. Tables of Integral Transforms [Volumes I & II]. McGraw-Hill, 1954.
[28] U Khristenko, L Scarabosio, P Swierczynski, E Ullmann, and B Wohlmuth. Analysis of Boundary
Eﬀects on PDE-Based Sampling of Whittle–Mat´ern Random Fields.
SIAM/ASA Journal on
Uncertainty Quantiﬁcation, 7(3):948–974, 2019.
[29] Curtis R Vogel. Computational Methods for Inverse Problems. Siam, 2002.
[30] Per Christian Hansen.
Rank-Deﬁcient and Discrete Ill-Posed Problems: Numerical Aspects of
Linear Inversion, volume 4. Siam, 2005.
[31] Majid Jafari Khaledi and Firoozeh Rivaz. Empirical Bayes spatial prediction using a Monte Carlo
EM algorithm. Statistical Methods and Applications, 18(1):35–47, 2009.
[32] Christian Robert and George Casella.
Monte Carlo Statistical Methods.
Springer Science &
Business Media, 2013.
[33] Oliver Schabenberger and Carol A Gotway. Statistical Methods for Spatial Data Analysis. CRC
press, 2017.
[34] Noel Cressie. Statistics for Spatial Data. John Wiley & Sons, 2015.
[35] W. Schwanghart. Experimental (Semi-) Variogram, 09 Jan 2013. MATLAB Central File Exchange.
Retrieved 21 May 2018.
[36] W. Schwanghart. variogramﬁt, 14 Oct 2010. MATLAB Central File Exchange. Retrieved 21 May
2018.
[37] George Casella. An Introduction to Empirical Bayes Data Analysis. The American Statistician,
39(2):83–87, 1985.
[38] Dave Hale.
Implementing an anisotropic and spatially varying Mat´ern model covariance with
smoothing ﬁlters. 2013.
[39] Kathryn Anne Haskard. An anisotropic Mat´ern spatial covariance model: REML estimation and
properties. PhD thesis, University of Adelaide, 2007.
[40] William G Jacoby.
Loess: a nonparametric, graphical tool for depicting relationships between
variables. Electoral Studies, 19(4):577–613, 2000.
[41] Gb11111. Arizona – the wave. Flickr. Retrieved 22 May 2019.
[42] David S Watkins. Fundamentals of Matrix Computations, volume 64. John Wiley & Sons, 2004.
