arXiv:1507.08403v1  [stat.ME]  30 Jul 2015
Statistical Science
2015, Vol. 30, No. 2, 176–180
DOI: 10.1214/15-STS518
Main article DOI: 10.1214/14-STS487
c
⃝Institute of Mathematical Statistics, 2015
When Doesn’t Cokriging
Outperform Kriging?
Hao Zhang and Wenxiang Cai
Abstract.
Although cokriging in theory should yield smaller or equal
prediction variance than kriging, this outperformance sometimes is
hard to see in practice. This should motivate theoretical studies on
cokriging. In general, there is a lack of theoretical results for cokriging.
In this work, we provide some theoretical results to compare cokriging
with kriging by examining some explicit models and speciﬁc sampling
schemes.
Key words and phrases:
Cokriging, equivalence of probability mea-
sures, inﬁll asymptotics, kriging.
Genton and Kleiber (2015) provided an excellent
review of recent development in the mutivariate co-
variance functions. In many situations, the ultimate
objective of modeling the multivariate covariance
function is to obtain superior prediction through
cokriging. In theory, cokriging should have a pre-
diction variance no larger than that of the kriging
prediction. However, as the authors point out in the
paper, sometimes the improvement of cokriging is
very little or none. In this note, we try to shed some
light through some theoretical investigations.
For univariate Gaussian stationary processes, we
now have a good understanding of the proper-
ties of kriging and statistical inferences. For ex-
ample, theoretical results have been established to
justify (i) that two diﬀerent covariance functions
may yield asymptotically equally optimal predic-
tion (Stein, 1999), and (ii) some parameters are
Hao Zhang is Professor, Department of Statistics and
Department of Forestry and Natural Resources, Purdue
University, West Lafayette, Indiana 47907, USA e-mail:
zhanghao@purdue.edu. Wenxiang Cai is Graduate
Student, School of International Trade and Economics,
The University of International Business and
Economics, Beijing, China.
This is an electronic reprint of the original article
published by the Institute of Mathematical Statistics in
Statistical Science, 2015, Vol. 30, No. 2, 176–180. This
reprint diﬀers from the original in pagination and
typographic detail.
not consistently estimable if the spatial domain is
bounded (Zhang, 2004). We know the conditions un-
der which a misspeciﬁed covariance function yields
an asymptotically right prediction and can exploit
this fact to simplify computations (Zhang, 2004;
Du, Zhang and Mandrekar, 2009).
We lack the analogous understanding for the mul-
tivariate spatial models. There are no explicit theo-
retical results to answer the following questions:
• How important is the cross-covariance function?
Speciﬁcally, could two diﬀerent multivariate co-
variance functions yield an asymptotically equally
optimal prediction?
• Which parameters are important to cokriging? We
know which parameters are important to kriging.
• How much improvement does cokriging have over
kriging?
One particular concept that has been shown useful
in the study of kriging is the equivalence of probabil-
ity measures due to a theorem established by Black-
well and Dubins (1962). Let si, i = 1,...,n be sam-
pling sites on a ﬁxed domain (area) where the pro-
cess Y (s) is observed, and {si,i > n} be a set of sites
on the same domain where Y is to be predicted. If
the two Gaussian measures P1 and P2 are equivalent
on the σ-algebra generated by Y (si),i = 1,2,...,
then with P1-probability one,
sup|P1{A|Y (si),i = 1,...,n}
−P2{A|Y (si),i = 1,...,n}|
1

2
H. ZHANG AND W. CAI
→0
as n →∞,
where the supremum is taken over A ∈σ{Y (si),i >
n}. The above result implies that the linear pre-
dictions under the two measures are asymptotically
equally optimal (Stein, 1999).
This result can be readily extended to the mul-
tivariate spatial process and therefore implies two
cokriging predictors are asymptotically equally op-
timal under the two probability measures if the two
Gaussian measures are equivalent. However, unlike
in the univaritate case, there are very limited re-
sults on equivalence of probability measures. Ruiz-
Medina and Porcu (2015) gave some general condi-
tions for equivalent measures for multivariate Gaus-
sian processes though there is still a lack of explicit
examples where equivalent measures occur.
We now provide some suﬃcient conditions for the
equivalent of Gaussian measures for a particular bi-
variate model. Let Y(s) = (Y1(s),Y2(s))′ be a sta-
tionary bivariate Gaussian process with the follow-
ing bivariate covariance function under the proba-
bility measure Pk, k = 1,2, such that
Cij(h) = Cov(Yi(s),Yj(s + h))
= M(|h|,σij,k,αk,ν),
i,j = 1,2,
where M(·,σ2,α,ν) denotes the Mat´ern covariance
function with variance σ2, scale parameter α and the
smoothness parameter ν. The following are suﬃcient
conditions for the two measures Pk to be equivalent
on the σ-algebra generated by {Yi(s),s ∈D,i = 1,2}
for some bounded set D ∈Rd, d ≤3:
σ2
ii,1α2ν
1 = σ2
ii,2α2ν
2 ,
(1)
σ12,1/√σ11,1σ22,1 = σ12,2/√σ11,2σ22,2.
To prove this claim, we employ the Karhunen–
Lo`eve expansion under measure P1. Since the two
processes {Yi(s)/√σii,1}, i = 1,2, have the same co-
variance function M(|h|,a,α,ν) and therefore pos-
sess the same Karhunen–Lo`eve expansion under
measure P1,
Yi(s)
√σii,1
=
∞
X
l=1
p
λlfl(s)Zil,
where for i = 1,2, {Zil,l = 1,...} consists of i.i.d.
standard normal random variables under measures
P1. Clearly, the eigenvalues λl and eigenfunctions
fl(s) only depend on the correlation function and
hence do not depend on i. In addition,
Zil =
1
p
λlσii,1
Z
D
Yi(s)fl(s)ds.
Using the above expression, it is not hard to show
that
E1(Z1lZ2m) = rδl,m
(2)
for r = σ12,1/√σ11,1σ22,1,
E2(Z1lZ2m) = rE2(Z1lZ1m).
(3)
The
Karhunen–Lo`eve
expansion
implies
that
{Zil,l = 1,2,...,∞} is a basis of the Hilbert space
generated by {Yi(s),s ∈D} with respect to mea-
sure P1. Hence, {Z1l,Z2l,l = 1,2,...} is a basis of
the Hilbert space generated by the two processes
{Yi(s),i = 1,2,s ∈D}. The two measures are equiv-
alent on the Hilbert space if and only if they are so
on {Z1i,Z2i,i = 1,2,...} (Ibragimov and Rozanov,
1978, page 72). To show the equivalence of the
two measures, we only need to verify (Stein, 1999,
page 129)
2
X
i=1
2
X
j=1
∞
X
l=1
∞
X
m=1
(E1(ZilZjm) −E2(ZilZjm))2
(4)
< ∞.
Because conditions (1) imply that the two measures
are equivalent on {Yi(s),s ∈D} (Zhang, 2004), we
must have
∞
X
l=1
∞
X
m=1
(E1(ZilZim) −E2(ZilZim))2 < ∞,
i = 1,2.
For i ̸= j, equations (2) and (3) imply
∞
X
l=1
∞
X
m=1
(E1(Z1lZ2m) −E2(Z1lZ2m))2
= r2
∞
X
l=1
∞
X
m=1
(E1(Z1lZ1m) −E2(Z1lZ1m))2 < ∞.
Therefore, (4) is proved and so is the suﬃciency of
the conditions. We now have an explicit example
where two diﬀerent bivariate covariance functions
yield asymptotically equal cokriging results.
Next, we will try to explain why sometimes it
is hard to see the improvement of cokriging over
the kriging prediction. Consider a bivariate Gaus-
sian process with mean 0 and exponential covariance
functions such that
Cij(h) = Cov(Yi(s),Yj(s + h))
(5)
= σij exp(−α|h|),
i,j = 1,2.

COMMENT
3
Assume the two processes are observed at n points
si,i = 1,...,n, and predict Y1(0). Write Y1 =
(Y1(si),i = 1,...,n)′, Y2 = (Y2(si),i = 1,...,n)′. It
is known that in this case the cokriging predictor is
identical to the kriging predictor. To see this, let R
denote the correlation matrix of Y1, which is also
the correlation matrix of Y2. Then
Cov(Yi,Yj) = σijR.
Let V be the matrix with (i,j)th element σij. Then
the covariance matrix of (Y1,Y2) is V ⊗R. Let k
denote the vector of correlation coeﬃcients between
Y1(s), the variable to be predicted, and Y1. Then
E(Y1(s)|Y1,Y2)
(6)
= ((σ11,σ22) ⊗k′)(V −1 ⊗R−1)Y
= ((k′,0) ⊗R−1)Y = k′R−1Y1
(7)
= E(Y1(s)|Y1).
Therefore, cokriging is identical to kriging and we
should not expect any improvement of cokriging
over kriging. We can also show that they are identi-
cal if Y2(s) is observed at a subset of locations where
Y1 is observed.
One scenario where cokriging might outperform
kriging is when the auxiliary variable is observed
at more locations than the predicted variable. In
the next example, we will examine analytically
what variables aﬀect the improvement of cokriging
over kriging. We assume the same bivariate model
(5) and Y2(s) are observed at s ∈O = {i/n,i =
±1,±2,...,±n}, but Y1(s) is observed at half of the
points s ∈O1 = {2i/n,i = ±1,±2,...,±n/2} where
n is an even integer. Denote the kriging predictor
and cokriging predictor of Y1(0) by
ˆY1(0) = E(Y1(0)|Y1(s),s ∈O1),
(8)
˜Y1(0) = E(Y1(0)|Y1(s),s ∈O1,Y2(t),t ∈O).
(9)
We will derive the following asymptotic relative ef-
ﬁciency of kriging to cokriging:
lim
n→∞
E(Y1(0) −˜Y1(0))2
E(Y1(0) −ˆY1(0))2 = 1 −r2/2,
(10)
where r is the correlation coeﬃcient of Y1(s) and
Y2(s).
The asymptotic relative eﬃciency of kriging pre-
diction does not depend on the scale parameter α.
Intuitively this is understandable. However, for a ﬁ-
nite sample size n, α may aﬀect the eﬃciency. We
now present a simulation study to see how α and r
aﬀect the relatively eﬃciency of kriging prediction.
We consider the exponential covariance model with
σ11 = σ22 = 1 and r = 0.2 and 0.5, and α = 2,4 and
8. The auxiliary variable Y2 is observed at ±i/n,
i = 1,...,n, but the primary variable Y1 is observed
at ±i/n for even integers 0 < i ≤n. We calculate
the prediction variance for predicting Y1(0) using
both kriging and cokriging and obtain the relative
eﬃciency of kriging for diﬀerent n, α and r.
Figure 1 plots the relative eﬃciency for diﬀerent r,
α and n. We see that the relative eﬃciency of krig-
ing decreases as n increases, which means that it is
more likely to see the outperformance of cokriging
over kriging when n is larger. When the spatial au-
tocorrelation is strong (i.e., α smaller), the asymp-
totic eﬃciency is achieved relatively faster (i.e., with
n not too larger). This agrees with many other inﬁll
asymptotic results.
We now prove (10). We ﬁrst note a Marko-
vian property of the exponential model established
by Du, Zhang and Mandrekar (2009), which says
E(Y1(s)|Y1(s),s ∈B) only depends on the two near-
est neighbors of s in a ﬁnite set B such that s is
between the minimum and the maximum elements
of B (Du, Zhang and Mandrekar, 2009, Lemma 1).
Also from the lemma, we obtain
E(Y1(0) −ˆY1(0))2 = 2σ2
11α/n + o(n−2).
In the extreme case when r = 1, we can view the
process Y1(s) being observed at O. Then in this ex-
treme case, the above equation implies
E(Y1(0) −˜Y1(0))2 = σ2
11α/n + o(n−2).
The ratio in (10) is clearly 1/2. Hence, we have ver-
iﬁed (10) for this extreme case. On the other hand,
when r = 0, the two predictors ˆY1(0) and ˜Y1(0) are
identical and (10) is obviously true.
We are going to show that
˜Y1(0) = b1Y1(−2/n) + b2Y1(2/n)
+ b3Y2(−2/n) + b4Y2(−1/n)
(11)
+ b5Y2(1/n) + b6Y2(2/n),
where
b1 = b2 =
e−2α/n
e−4α/n + 1,
(12)
b3 = b6 = −re−2α/n
e−4α/n + 1,
b4 = b5 =
re−α/n
e−2α/n + 1.
(13)

4
H. ZHANG AND W. CAI
Fig. 1.
Relative eﬃciency of kriging to cokriging for diﬀerent r, α and n. The solid horizonal line is the asymptotic relative
eﬃciency 1 −r2/2.
Some straightforward calculation yields
E(Y1(0) −˜Y1(0))2
= −σ2
11(−2e−4α/nr2 + e−6α/n + 2e−2α/nr2
+ e−4α/n −e−2α/n −1)
/((e−4α/n + 1)(e−2α/n + 1))
= σ2
11(2 −r2)α/n + o(n−2).
Then (10) immediately follows. Hence, it is suf-
ﬁcient to show (11). It is possible to show that
Y1(0) −˜Y1(0) is uncorrelated with any Y1(s),s ∈O1
and with any Y2(t),t ∈O. Hence, ˜Y1(0) must be
the best linear prediction. Here we take an alter-
native but more intuitive approach. We will ap-
ply the Markovian property of the Gaussian expo-
nential model to show that ˜Y1(0) only depends on
Y1(−2/n), Y1(2/n), Y2(−2/n), Y2(−1/n), Y2(1/n)
and Y2(2/n). Consequently, the coeﬃcients bi’s in
(12) and (13) can be found by solving linear equa-
tions.
For any odd integer i between −n and n,
E(Y2(i/n)|Y1(s),s ∈O1,Y2(t),t ∈O,t ̸= i/n)
= E{E(Y2(i/n)|Y1(t),Y2(t),t ∈O,t ̸= i/n)|
Y1(s),s ∈O1,t ∈O,t ̸= i/n}
(14)
= E{E(Y2(i/n)|Y2(t),t ∈O,t ̸= i/n)|
Y1(s),s ∈O1,t ∈O,t ̸= i/n}
= E{Y2(i/n)|Y2(ti−),Y2(ti+)},
where ti−and ti+ are the two nearest neighbors of
i/n in O. For example, for i = −1, ti−= −2/n and
ti+ = 1/n.
Deﬁne ei = Y2(i/n) −E{Y2(i/n)|Y2(ti−),Y2(ti+)}
for an odd i. Then ei is independent of Y1(s),s ∈O1
and Y2(t),t ∈O and t ̸= i/n. Consequently,
E(Y1(0)|Y1(s),s ∈O1,Y2(t),t ∈O)
= E(Y1(0)|Y1(s),Y2(s),s ∈O1,ei,i odd)
(15)
= E(Y1(0)|Y1(s),Y2(s),s ∈O1)
+ E(Y1(0)|ei,i odd).
The ﬁrst term in the above equation depends only
on Y1(−2/n) and Y1(2/n) due to the Markovian
property. For the second term, because the cross-
covariance function is proportional to the covariance
function of Y2(t), we have
E(Y1(0)|ei,i odd) = rE(Y2(0)|ei,i odd).
Applying again the property of conditional expecta-
tion and the Markovian property, we get
E(Y2(0)|ei,i odd)
= E(E{Y2(0)|Y2(t),t ∈O}|ei,i odd)
= βE{Y2(−1/n) + Y2(1/n)|ei,i odd}
= βE{Y2(−1/n) + Y2(1/n)|e−1,e1},
where β
is the constant in E(Y2(0)|Y2(−1/n),
Y2(1/n)) = β(Y2(−1/n) + Y2(1/n)), and the last

COMMENT
5
equation follows the fact that ei is independent to
Y (1/n) and Y2(−1/n) if i ̸= 1 or −1. Therefore, the
second term of (15) is a linear function of e−1 and e1
and hence a linear function of Y2(i/n), i = −2,−1,1
and 2.
ACKNOWLEDGMENT
Hao Zhang is supported by a Grant from the
China Social Science Foundation (11&ZD167) and
NSF Grant (IIS-1028291). Wenxiang Cai is sup-
ported by the Excellent Dissertation Fund of the
University of International Business and Economics.
REFERENCES
Blackwell, D. and Dubins, L. (1962). Merging of opinions
with increasing information. Ann. Math. Statist. 33 882–
886. MR0149577
Du, J., Zhang, H. and Mandrekar, V. S. (2009). Fixed-
domain asymptotic properties of tapered maximum likeli-
hood estimators. Ann. Statist. 37 3330–3361. MR2549562
Genton, M. G. and Kleiber, W. (2015). Cross-covariance
functions for multivariate geostatistics. Statist. Sci. 30 147–
163.
Ibragimov, I. A. and Rozanov, Y. A. (1978). Gaussian
Random Processes. Springer, New York. MR0543837
Ruiz-Medina, M. D. and Porcu, E. (2015). Equivalence
of Gaussian measures of multivariate random ﬁelds. Stoch.
Environ. Res. Risk Assess. 29 325–334.
Stein, M. L. (1999). Interpolation of Spatial Data: Some
Theory for Kriging. Springer, New York. MR1697409
Zhang, H. (2004). Inconsistent estimation and asymptoti-
cally equal interpolations in model-based geostatistics. J.
Amer. Statist. Assoc. 99 250–261. MR2054303
