Accounting for Multiple Covariates in Non-Stationary Geostatistical Modelling
Olatunji Johnsona, Bedilu A Ejigub, Ezra Gayawanc
aDepartment of Mathematics, University of Manchester, Manchester, UK
bDepartment of Statistics, , Addis Ababa University, Addis Ababa, Ethiopia
cDepartment of Statistics, , Federal University of Technology,, Akure, Nigeria
Abstract
Model-based geostatistics (MBG) is a subfield of spatial statistics focused on predicting spatially continuous phe-
nomena using data collected at discrete locations. Geostatistical models often rely on the assumptions of station-
arity and isotropy for practical and conceptual simplicity. However, an alternative perspective involves considering
non-stationarity, where statistical characteristics vary across the study area. While previous work has explored non-
stationary processes, particularly those leveraging covariate information to address non-stationarity, this research
expands upon these concepts by incorporating multiple covariates and proposing different ways for constructing non-
stationary processes. Through a simulation study, the significance of selecting the appropriate non-stationary process
is demonstrated. The proposed approach is then applied to analyse malaria prevalence data in Mozambique, showcas-
ing its practical utility.
Keywords: Disease mapping, geostatistics, non-stationarity, covariance function, covariates.
1. Introduction
Model-based geostatistics (MBG) (Diggle et al., 1998), is a field within spatial statistics that offers valuable tools
for making spatially continuous inferences. It specifically allows us to predict spatially continuous phenomena based
on data collected at discrete locations within a defined region of interest. MBG operates under a principled likelihood-
based approach, leveraging the ”first law of geography,” which suggests that proximity implies a higher level of spatial
correlation. This enables geostatistical models to effectively borrow information across space to estimate values at
any location within the studied area. Given its adaptability to low-resource settings where disease registries may
be lacking, MBG has found increasing applications in epidemiological studies conducted in developing countries.
Notably, MBG has been instrumental in mapping various infectious diseases, including malaria (Ejigu and Moraga,
2023), Loa loa (Johnson et al., 2022), trachoma (Amoah et al., 2022), soil-transmitted helminths (Mogaji et al., 2022),
and onchocerciasis. These applications have provided valuable insights for monitoring disease burden using data from
household surveys.
Let Y, denote the outcome measured at a specific location x. The standard linear geostatistical model for the
outcome Y at location x takes the form:
Yi = d⊤(xi)β + S (xi) + Zi,
for i = 1, . . . , n
(1)
where d(x) is a vector of explanatory variables at location x with associated regression coefficients β and Zi is often
referred to as nugget effect and assumed to be a set of independent and identically distributed (i.i.d.) Gaussian random
variables with mean zero and variance τ2. The common assumption on S (x) is that it is stationary and isotropic
implying that
Cov{S (x), S (x′)} = σ2ρ(u; ϕ),
Email addresses: olatunji.johnson@manchester.ac.uk (Olatunji Johnson), bedilu.alamirie@aau.edu.et (Bedilu A Ejigu),
egayawan@futa.edu.ng (Ezra Gayawan)
Preprint submitted to Spatial Statistics
December 13, 2024
arXiv:2412.09225v1  [stat.ME]  12 Dec 2024

where σ2 is the variance of the process S , u = ||x−x′|| is the Euclidean distance between locations x and x′ and ρ(u; ϕ)
is a correlation function with parameters ϕ. We usually specify ρ(u; ϕ) to be a member of the Mat´ern family (Mat´ern,
1960).
The assumption of stationarity and isotropy is common in geostatistical models due to practical and conceptual
reasons. Stationarity and isotropy use universal covariance functions that are solely a function of distance to simplify
the model specification and modelling process and allow for efficient estimation of model parameters. However, it is
essential to recognise that these assumptions may not always hold in real-world scenarios. As an alternative, we can
assume non-stationarity, where statistical properties vary across the study area. In this case, the mean and/or variance
of the process can vary spatially based on covariates or other factors. These are particularly useful when there are
evident spatial trends or gradients in the data.
Several models for non-stationary processes have been proposed in the past (Sampson and Guttorp, 1992; Higdon,
1998; Christopher J. and Mark J., 2006; Alexandra M. et al., 2011). The space deformation methods proposed by
Sampson and Guttorp (1992) is a pioneering paper in nonstationary covariance function modelling. In their proposal,
the basic idea is to transform the geographic region D to a new region G, a region such that stationarity and isotropy
hold on G. However, the space deformation method suffers from two limitations: i) it cannot quantify the uncertainty
introduced in estimating the mapping from D to G, and ii) the estimated mapping is often not one-to-one and folded
over itself.
The process convolution method (Higdon, 1998) is the most popular method of constructing a nonstationary
process because it is easier to specify using the kernel function rather than the covariance function. This approach
is based on the convolution of a stationary process with a nonstationary kernel to obtain a nonstationary process.
However, this approach is highly parameterized and therefore difficult to fit, since it is hard to estimate nonstationary
behaviour using only a single realization of a spatial process.
On the other hand, other approaches allow incorporating covariates in specifying the covariance structure of a
spatial process (Alexandra M. et al., 2011; Ejigu et al., 2020). These approaches consider covariate information to
handle non-stationarity in different ways. The method proposed by (Alexandra M. et al., 2011) relax the assumption
of stationary Gaussian process by accounting for covariate information in the covariance structure of the process to
allow the latent space model of Sampson and Guttorp (1992) to be of dimension D > 2. The method proposed by
(Ejigu et al., 2020) directly incorporates spatially referenced covariates into the covariance function but is confined
to the inclusion of a single covariate. Considering that different parts of the spatial domain may be influenced by
different sets of covariates, the model’s flexibility to adapt locally is enhanced by incorporating multiple covariates.
This allows the model to effectively capture diverse variations across the spatial domain.
The main contribution of this paper is to propose an extension to the approach proposed by Ejigu et al. (2020),
by allowing for the inclusion of multiple covariates in the covariance of the spatial process. Specifically, we consider
ways to combine multiple correlation functions, either by multiplying or adding them to give a valid corresponding
correlation function.
The subsequent sections of the paper are structured as follows. Section 2 provides a review of the existing ap-
proach and introduces our proposed extension. In Section 3, we assess the robustness of our proposed approach via
a simulation study. Section 4 showcases the application of the proposed approach using malaria prevalence data in
Mozambique. Lastly, in Section 5, we conclude with a discussion of the proposed method and explore potential
avenues for further methodological extensions.
2. Review of existing methods and the proposed method
Non-stationary geostatistical model with one covariate
Ejigu et al. (2020) proposed a non-stationary geostatistical model with one covariate, in their approach, they
replace the stationary Gaussian process in Equation (1) with another Gaussian process that is a function over both
space x and a covariate e, denoted as S (xi, ei). The model then takes the form
Yi = d⊤(xi)β + S (xi, ei) + Zi,
(2)
They assumed this to be a Gaussian process with mean zero and covariance function
Cov{S (x, e), S (x′, e′)} = σ2ρ(x, x′; e, e′; ϕ).
2

And then assumed a separable correlation function in which the space-covariate covariance function was decomposed
into the product of a purely spatial and a purely covariate-dependent covariance function such that ρ(x, x′; e, e′; ϕ) =
ρ1(x, x′; ϕ1)ρ2(e, e′; ϕ2), where ρ1(x, x′; ϕ1) is the spatial correlation function, and ρ2(e, e′; ϕ2) is the correlation func-
tion associated with covariate. They noted that the covariate e can be included in the fixed part of the model as well
as in the covariance function. However, in a realistic situation, most spatial data are accompanied by more than a
covariate and the desire is to investigate the influence of these covariates on the spatial outcome. Consequently, it
becomes necessary to extend the proposed method to include more covariates.
Non-stationary geostatistical model with more than one covariate
Here we proposed an extension to the method above by allowing for the inclusion of more than one covariate,
which allows for the interaction between the spatial component and the covariates. This is essential for capturing local
spatial heterogeneity and addressing possible complexities of the data. Given that different parts of the spatial domain
may be influenced by different sets of covariates, the utilization of multiple covariates enables the model to adapt
locally, effectively capturing diverse influences in different regions. For convenience’s sake, we restrict ourselves
to two covariates and later show how this can be generalised to more than two covariates. Then we proposed the
stochastic process S (x, e, t), where e and t are covariates. S (x, e, t) can be modelled as a zero mean Gaussian process
with a covariance function
Cov{S (x, e, t), S (x′, e′, t′)} = σ2ρ(x, x′; e, e′; t, t′; ϕ).
We consider three decompositions of the covariance function as follows:
Cov{S (x, e, t), S (x′, e′, t′)} = σ2ρ1(x, x′; ϕ1)ρ2(e, e′; ϕ2)ρ3(t, t′; ϕ3)
(3)
Cov{S (x, e, t), S (x′, e′, t′)} = σ2(ρ1(x, x′; ϕ1)ρ2(e, e′; ϕ2) + ρ1(x, x′; ϕ1)ρ3(t, t′; ϕ3)),
(4)
and
Cov{S (x, e, t), S (x′, e′, t′)} = σ2(ρ1(x, x′; ϕ1) + ρ2(e, e′; ϕ2) + ρ3(t, t′; ϕ3)),
(5)
where ρ1(x, x′; ϕ1) is the spatial correlation function, ρ2(e, e′; ϕ2) is the correlation function associated with covari-
ate e, and ρ3(t, t′; ϕ3) is the correlation function associated with covariate t. The aforementioned covariance structures
presented in Equations (3) (4) and (5) are equivalent to the covariance structure of S (·) in models named Model 1,
Model 2, and Model 3, respectively:
Model 1
Yi = d⊤(xi)β + S (xi, ei, ti) + Zi,
(6)
Model 2
Yi = d⊤(xi)β + S (xi, ei) + S (xi, ti) + Zi,
(7)
Model 3
Yi = d⊤(xi)β + S (xi) + S (ei) + S (ti) + Zi,
(8)
We assume that ρ1(x, x′; ϕ1) is a Mat´ern function (Mat´ern, 1960), defined as
ρ(u) =
1
Γ(κ)2κ−1 (u/ϕ)κKκ(u/ϕ)
where Kκ(·) denotes the modified Bessel function of the third kind of order κ, u = ∥x −x′∥is the Euclidean distance
between location x and x′, u = |e −e′| is the absolute difference between the covariate e and e′ or u = |t −t′| is the
absolute difference between the covariate t and t′ and ϕ is the scale parameter which controls the rate at which the
correlation gets close to zero with increasing separation distance u. According to Zhang (2004), it can be difficult to
estimate κ in practice as it requires a large amount of data. Therefore, we fixed the value of κ to 1.5 in the simulation
study which then corresponds to ρ(u) =

1 +
√
3u
ϕ

exp

−
√
3u
ϕ

.
3

Inference: maximum likelihood estimation
Maximum likelihood estimation (MLE) is a widely used statistical technique for parameter estimation in models.
It aims to identify the parameter values within a probability distribution that yield the most optimal fit to the observed
data. This is achieved by maximizing the likelihood function, which measures the extent to which the chosen dis-
tribution and its associated parameters can account for the observed data. In this case, the likelihood function is a
multivariate normal distribution. Specifically, let θ = (β, σ2, ϕ1, ϕ2, ϕ3, τ2)⊤denote the vector of the parameters and
y⊤= (y1, . . . , yn) denotes the observed dataset, the log-likelihood function is given by
ℓ(θ) = ln (L(θ) = −n
2 ln(2π) −1
2 ln |Σ| −1
2(y −Dβ)⊤Σ−1(y −Dβ),
where D is an n × p matrix of explanatory variables and Σ is the n × n covariance matrix. We used numerical
optimization algorithms in R (R Core Team, 2023) to find the maximum likelihood estimate ˆθ.
3. Simulation Study
The purpose of the simulation study is twofold: To evaluate the effects of misspecifying the non-stationary covari-
ance function on 1) the parameter estimation θ, and 2) the spatial prediction of the outcome Y.
We consider three different data-generating mechanisms, referred to as Model 1, Model 2, and Model 3. In all
datasets, we simulate n = 700 observations and set the parameters to θ = (β0, β1, β2, σ2, ϕ1, ϕ2, ϕ3)⊤= (1, 0.5, −0.5, 0.5, 0.3, 0.2, 0.1)⊤.
The domain is defined as [0, 1] × [0, 1], and sampling locations are simulated using a standard uniform distribution.
Additionally, two predictors are simulated from the uniform distribution Unif[-1, 1]. Subsequently, S is sampled from
a multivariate normal distribution with a mean of zero and variances as specified in Equations 3, 4, and 5, and Models
1, 2, and 3 are evaluated.
Three scenarios are created for the analysis:
• In Scenario 1, data were simulated from Model 1, and the analysis was performed using Models 1, 2, and 3.
• In Scenario 2, data were simulated from Model 2, and the analysis was performed using Models 1, 2, and 3.
• In Scenario 3, data were simulated from Model 3, and the analysis was performed using Models 1, 2, and 3.
Therefore, Model 1 is correctly specified in Scenario 1, Model 2 is correctly specified in Scenario 2, and Model 3 is
correctly specified in Scenario 3.
3.1. Evaluating accuracy of the parameter estimate
As suggested by Burton et al. (2006), the accuracy of each parameter ˆθk can be assessed by calculating the per-
centage relative bias (PRB), given by
PRB = 1
B
B
X
k=1
(ˆθk −θk)/θk × 100,
where B is the number of simulations, and ˆθk and θk are the estimated and true values of the parameters, respectively.
This analysis assumes that none of the parameters are equal to zero. PRB provides insight into the direction and
magnitude of bias. A negative PRB indicates an underestimation, while a positive PRB indicates an overestimation.
Additionally, the coverage of the 95% Wald-type confidence interval was calculated, representing the proportion
of times the interval contained the ”true” performance value. The coverage probability is given as
CP = 1
B
B
X
j=1
I
ˆθ( j)
k,lower ≤θ( j)
k ≤ˆθ(j)
k,upper

,
where ˆθ( j)
k,lower and ˆθ(j)
k,upper are the lower and upper Wald type 95% confidence interval.
4

3.2. Evaluating the predictive performance of the model
We assess the predictive model performance using three metrics: bias, root mean square error (RMSE), and
coverage probability (CP). These metrics are defined as follows:
bias = 1
nB
n
X
i=1
B
X
j=1
 ˆY( j)
i
−Y(j)
i

,
RMS E = 1
nB
n
X
i=1
B
X
j=1
q ˆY(j)
i
−Y(j)
i

,
CP = 1
nB
n
X
i=1
B
X
j=1
I

Y( j)
i
∈PI(j)
0.95

,
where Y(j)
i
and ˆY(j)
i
are the true and predicted values of the outcome, respectively; I
 ˆY(j)
i
∈PI(j)
0.95

is an indicator
function that takes the value 1 if Y(j)
i
is inside the 95% prediction interval denoted by PI(j)
0.95 and 0 otherwise.
3.3. Simulation result
To demonstrate the proposed process, Figures 1, shows the simulated surface of the covariates e and t, the sta-
tionary process S (x) (if covariates were not included) and the resulting non-stationary process, for models 1, 2 and
3, respectively. The Figure was generated using the following parameters (σ2, ϕ1, ϕ2, ϕ3)⊤= (0.5, 0.3, 0.2, 0.1)⊤.
Clearly, for the non-stationary processes, the trend is not constant in space.
0.0
0.2
0.4
0.6
0.8
1.0
Covariate (e)
−0.5
0.0
0.5
1.0
1.5
Covariate (t)
−1.5
−1.0
−0.5
0.0
0.5
1.0
Stationary
−1.5
−1.0
−0.5
0.0
0.5
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Non−stationary − model 1
−1
0
1
2
0.0
0.2
0.4
0.6
0.8
1.0
Non−stationary − model 2
−3
−2
−1
0
1
2
3
0.0
0.2
0.4
0.6
0.8
1.0
Non−stationary − model 3
−2
−1
0
1
2
3
Figure 1: An example of a simulated surface of the spatial covariates e (top left panel) and t (top middle panel), stationary process (top right panel)
and non-stationary process from models 1, 2 and 3 (lower panels).
Table 1 shows the simulation study results for the parameters, the percentage relative bias and the coverage prob-
ability of the parameters. Across the different scenarios, models exhibit varying degrees of bias in estimating param-
eters. Clearly, for scenario 1, Model 1 has the least bias; for scenario 2, Model 2 has the least bias; and for scenario
3, Model 3 has the least bias. These findings underscore the critical role of selecting an appropriate model tailored to
the specific data source, thereby ensuring more accurate results. The coverage probability quantifies how effectively
the prediction intervals encompass the actual or true values of the outcome variable Y. In Scenario 1, the coverage
probabilities for Model 1 were quite close to the nominal 95%, while Model 2 and Model 3 are conservative and
permissive, respectively. In Scenario 2, the coverage probabilities for Model 2 were quite close to the nominal 95%,
5

while Model 1 and Model 3 are conservative. In Scenario 3, the coverage probabilities for Model 3 were quite close
to the nominal 95%, while Model 1 and Model 2 are conservative.
Table 2 shows the simulation study results for the prediction, the bias, RMSE and the coverage probability of the
prediction. In Scenario 1, we found that Model 1 has the least bias, RMSE and achieves a good coverage probability
(95.32%). This result is consistent across all the scenarios, that is, Model 2 is the best in scenario 2 and Model 3
is the best in scenario 3. These results emphasise the importance of selecting an appropriate predictive model. The
predictive performance of the model depends on whether the data generation process aligns with the assumptions of
the model.
Table 1: Summary of the result of the simulation study showing the percentage relative bias and coverage probability of the parameters
Parameter
Percentage relative bias
95% Coverage probability
Model 1
Model 2
Model 3
Model 1
Model 2
Model 3
Scenario 1
β0
-17.18
17.40
88.28
95.25%
95.63%
94.90%
β1
-0.42
3.11
1.13
95.97%
95.99%
94.95%
β2
0.80
-1.03
-1.10
95.97%
95.99%
94.95%
σ2
16.40
34.97
-26.91
95.20%
98.53%
94.90%
ϕ1
13.60
-64.19
-99.50
95.83%
98.72%
94.32%
ϕ2
304.08
1453.31
3.78e+09
95.43%
98.73%
93.98%
ϕ3
598.46
3527.57
7.99e+03
95.39%
98.85%
93.59%
Scenario 2
β0
-62.50
-44.38
72.09
95.41%
95.06%
95.45%
β1
0.05
0.01
-0.43
95.68%
95.30%
95.78%
β2
-0.18
-0.04
-1.67
95.63%
95.56%
95.48%
σ2
410.13
16.12
454.15
96.52%
95.96%
96.65%
ϕ1
50.79
18.59
-95.74
96.68%
95.79%
97.36%
ϕ2
498.02
198.40
4237.16
96.04%
94.94%
97.56%
ϕ3
557.83
90.31
1624.61
96.32%
94.94%
97.12%
Scenario 3
β0
-302.23
-144.04
-26.22
95.75%
95.66%
95.61%
β1
-0.04
1.35
0.02
95.05%
95.35%
95.00%
β2
2.28
0.95
0.29
95.45%
95.42%
95.39%
σ2
1.76e+09
3.16e+05
187.29
96.58%
96.32%
95.36%
ϕ1
3546.46
291.73
6.20
96.53%
96.42%
95.36%
ϕ2
3442.31
1114.59
261.41
96.86%
96.54%
95.33%
ϕ3
1.78e+04
5.34e+05
741.21
96.05%
96.52%
95.33%
6

Table 2: Summary of the result of the simulation study showing the bias, RMSE, and Coverage Probability for the prediction of the outcome Y
Bias
RMSE
Coverage Probability
Scenario 1
Model 1
-3.05e-10
5.48e-09
95.32%
Model 2
1.86e-09
1.37e-08
96.76%
Model 3
9.82e-08
2.64e-07
97.78%
Scenario 2
Model 1
-3.87e-11
2.91e-09
95.74%
Model 2
-1.16e-11
2.38e-09
95.37%
Model 3
-1.60e-10
1.10e-08
96.75%
Scenario 3
Model 1
2.56e-05
1.65e-04
97.93%
Model 2
9.82e-08
2.64e-07
96.78%
Model 3
-1.00e-08
5.83e-08
95.22%
4. Application of the proposed method to malaria in Mozambique
To illustrate the proposed approach, we fitted the nonstationary geostatistical model to map Malaria prevalence in
Mozambique. The malaria prevalence data obtained from the Malaria Atlas Project (www.malariaatlas.org) was
used for this analysis. We used empirical logit prevalence as the outcome. Demographic and environmental covariates
known to influence malaria transmission, such as altitude, temperature, precipitation, humidity, and proximity to water
sources, are incorporated into the model. Figure 2 (top-left panel) shows the predicted empirical prevalence of malaria
across 447 distinct locations. This data has previously been analysed by Moraga et al. (2021) and Ejigu and Moraga
(2023).
Demographic and environmental covariates
The data on temperature, precipitation and altitude were obtained from the WorldClim database (www.worldclim.
org), data on the distance to inland water were derived from the Worldpop database (www.worldpop.org), and hu-
midity data were extracted from the University of East Anglia Climatic Research Unit database (UEACRU, www.cru.
uea.ac.uk) using the R package raster.
Model formulation
At first, a non-spatial model encompassing all available covariates was fitted. This initial model highlighted the
lack of significance of precipitation while identifying the significance of other covariates, aligning with findings in
(Ejigu and Moraga, 2023). Consequently, we retained altitude, temperature, humidity, and distance from inland water
as influential factors.
Subsequently, all four variables were included in the model’s mean structure, with two of them integrated into
the covariance function. The selection of the two variables for inclusion in the covariance function was guided by
identifying the combination that yielded the lowest AIC and BIC values. We explored the three model formulations
shown in Equations (6), (7), and (8). Additionally, we considered the flexibility of the smoothness parameter κ by
allowing it to vary at 0.5, 1.5, and 2.5. The final model selection was made based on the one exhibiting the smallest
AIC and BIC values.
Hence, our final selected model is given as follows:
Yi = β0+β1Altitude+β1Temperature+β2Humidity+β3Distance+β4Elevation+S (xi, Altitudei)+S (xi, Temperaturei)+Zi,
(9)
with Altitude and Temperature covariates incorporated into the covariance function, and κ1 = 1.5, κ2 = 2.5, and
κ3 = 1.5.
7

For comparison purposes, we also fitted a stationary geostatistical model (Eqn 10) to the data, where only the
separation distance between locations used in the spatial random effect. The stationary geostatistical model was of the
form:
Yi = β0 + β1Altitude + β1Temperature + β2Humidity + β3Distance + β4Elevation + S (xi) + Zi.
(10)
Result and discussion
Table 3 provides the parameter estimates and their corresponding 95% confidence intervals for the stationary and
the non-stationary models. Altitude, temperature and humidity show significant positive associations with malaria
prevalence, suggesting that higher altitudes, elevated temperatures and high humidity contribute to increased malaria
transmission based on the two models. Moreover, our result suggests that distance to inland water sources is not
statistically significant in the model, challenging our initial hypothesis of elevated malaria prevalence at close distances
to water bodies. This result was also observed by Moraga et al. (2021). The estimates of the scale parameter over the
space are similar for the two models. The estimated value of τ2 under the stationary approach is slightly different from
the one obtained under the non-stationary approach. This may be because the observed variability in τ2 is induced
by altitude and temperature (in the stationary approach) which was taken into account by the specified covariance
function under the non-stationary approach.
Figure 2 shows the predicted mean and upper and lower 95% prediction intervals for the stationary (bottom panels)
and the non-stationary (upper panels) models. The predicted mean prevalence of malaria is higher in the northern and
central parts of the country and lower in the southern part of the country. The prevalence surfaces for the stationary
and non-stationary models look similar. However, the 95% prediction intervals for the non-stationary model appear
wider than those of the stationary model, suggesting that the non-stationary model can potentially capture a broader
range of uncertainties.
Table 3: Parameter Estimates with the corresponding 95% confidence intervals for the stationary and the non-stationary models.
Non-stationary
Stationary
Parameter
Estimate
95% CI
Estimate
95% CI
β0
-30.1355
(-39.1782, -21.0929)
-26.0012
(-35.2202, -16.7821)
β1
0.0026
(0.0016, 0.0037)
0.0023
(0.0012, 0.0034)
β2
0.5288
(0.3628, 0.6948)
0.4360
(0.2675, 0.6045)
β3
0.1705
(0.1112, 0.2299)
0.1546
(0.0939, 0.2152)
β4
0.0139
(-0.0001, 0.0281)
0.0120
(-0.0022, 0.0262)
σ2
0.3541
(0.1697, 0.5385)
0.5200
(0.1905, 0.8496)
ϕs
3.1850
(1.8273, 6.1873)
3.7224
(0.4359, 7.0089)
ϕAltitude
2.4716
(0.9596, 4.9029)
-
-
ϕTemperature
2.1977
(0.1146, 3.8638)
-
-
τ2
0.5368
(0.2058, 0.8678)
1.3579
(0.0093, 2.7252)
AIC
907.8887
915.3301
BIC
934.9266
944.9604
8

Figure 2: Map showing the results of the proposed non-stationary model is presented in the top panels, contrasting with the results from the
stationary model depicted in the bottom panel. The map includes the predicted mean prevalence (left panel), lower 95% prediction interval (middle
panel) and upper 95% prediction interval (right panel).
9

5. Discussion
This paper introduces an approach for modelling non-stationary processes within geostatistical settings by in-
corporating multiple covariates into the spatial process.
We present three distinct methods for integrating mul-
tiple covariates into the spatial process.
Results from a comprehensive simulation study underscore the impor-
tance of selecting an appropriate geostatistical model when introducing a non-stationary process.
The method-
ology is illustrated through an analysis of malaria prevalence in Mozambique by fitting both the proposed non-
stationary model and a stationary model. We conclude that the non-stationary model has made a material differ-
ence to our predictive inferences for malaria prevalence as indicated in the uncertainty. The model was imple-
mented in R statistical software (R Core Team, 2023) and the code used for the analysis can be found in https:
//github.com/olatunjijohnson/Nonstationary_paper.
Throughout the paper, we used two covariates to model the non-stationary process. However, this framework can
be readily extended to accommodate more than two covariates (i.e., p) in a manner where equations 3, 4, and 5 can
be reformulated as:
Cov{S (x, e1, . . . ep), S (x′, e′
1, . . . e′
p)} = σ2ρ(x, x′; ϕ)
p
Y
j=1
ρj(ej, e′
j; ϕj)
Cov{S (x, e1, . . . ep), S (x′, e′
1, . . . e′
p)} = σ2ρ1(x, x′; ϕ1)

p
X
j=1
ρj(ej, e′
j; ϕj)
,
and
Cov{S (x, e, t), S (x′, e′, t′)} = σ2
ρ1(x, x′; ϕ1) +
p
X
j=1
ρ j(ej, e′
j; ϕj)
,
respectively, where e1, . . . ep denotes p covariates. This is particularly valuable in disease mapping when numer-
ous environmental variables drive the disease dynamics and the underlying process is non-stationary, varying across
different spatial and temporal scales. Using multiple covariates allows for a more nuanced representation of the
disease-environment interactions, capturing complex relationships that may not be evident with fewer covariates.
Careful selection of covariates remains crucial to accurately representing the underlying spatial process. Addi-
tionally, these covariates can also be incorporated into the mean structure of the geostatistical model, further enriching
its predictive capability. This flexibility underscores the robustness of the framework and its adaptability to diverse
disease mapping scenarios, particularly in the presence of non-stationary and multi-scale drivers.
There are other ways that the covariates can be incorporated into the covariance function. Risser (2016) provides
an excellent review of the class of nonstationary model. One can also allow the parameters of the covariance function
to be spatially varying resulting in a non-stationary process. Ingebrigtsen et al. (2014) shows that using stochastic
differential equations for spatial modelling allows covariate information to be easily introduced in the dependence
structure.
This analysis has several limitations. One limitation is that while various covariance functions can be considered,
we constrain ourselves by specifying the covariance function through the fixed smoothness parameter κ of the Mat´ern
covariance function. Future work could involve allowing this parameter to be estimated from the data, although it’s
essential to consider that, as noted by Zhang (2004), estimating κ from the data typically requires a substantial amount
of data.
Future work includes extending the current framework to account for anisotropy and account for directionality.
To achieve this, Mahalanobis distance between locations, as suggested by Schmidt et al. (2011), may be considered.
Also, there is an opportunity to extend the model to non-linear settings where the relationship between the outcome
and the covariates is modelled through smooth functions.
Competing interests
The authors declare that there is no conflict of interest regarding the publication of this article.
10

Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit
sectors.
Data accessibility
References
Alexandra M., S., Peter, G., Anthony, O., 2011. Considering covariates in the covariance structure of spatial processes. Environmetrics 22,
487–500.
Amoah, B., Fronterre, C., Johnson, O., Dejene, M., Seife, F., Negussu, N., Bakhtiari, A., Harding-Esch, E.M., Giorgi, E., Solomon, A.W., et al.,
2022. Model-based geostatistics enables more precise estimates of neglected tropical-disease prevalence in elimination settings: mapping
trachoma prevalence in ethiopia. International Journal of Epidemiology 51, 468–478.
Burton, A., Altman, D.G., Royston, P., Holder, R.L., 2006. The design of simulation studies in medical statistics. Statistics in medicine 25,
4279–4292.
Christopher J., P., Mark J., S., 2006. Spatial modelling using a new class of nonstationary covariance functions. Environmetrics 17, 485–506.
Diggle, P.J., Tawn, J.A., Moyeed, R.A., 1998. Model-based geostatistics. Journal of the Royal Statistical Society Series C: Applied Statistics 47,
299–350.
Ejigu, B.A., Moraga, P., 2023. A new way of analyzing malaria data: A non-stationary geostatistical modeling approach .
Ejigu, B.A., Wencheko, E., Moraga, P., Giorgi, E., 2020. Geostatistical methods for modelling non-stationary patterns in disease risk. Spatial
Statistics 35, 100397.
Higdon, D., 1998. A process-convolution approach to modelling temperatures in the north atlantic ocean. Environmental and Ecological Statistics
5, 173–190.
Ingebrigtsen, R., Lindgren, F., Steinsland, I., 2014. Spatial models with explanatory variables in the dependence structure. Spatial Statistics 8,
20–38.
Johnson, O., Giorgi, E., Fronterr`e, C., Amoah, B., Atsame, J., Ella, S.N., Biamonte, M., Ogoussan, K., Hundley, L., Gass, K., et al., 2022.
Geostatistical modelling enables efficient safety assessment for mass drug administration with ivermectin in loa loa endemic areas through a
combined antibody and loascope testing strategy for elimination of onchocerciasis. PLoS Neglected Tropical Diseases 16, e0010189.
Mat´ern, B., 1960. Spatial variation, technical report. Statens Skogsforsningsinstitut, Stockholm .
Mogaji, H.O., Johnson, O.O., Adigun, A.B., Adekunle, O.N., Bankole, S., Dedeke, G.A., Bada, B.S., Ekpo, U.F., 2022. Estimating the population
at risk with soil transmitted helminthiasis and annual drug requirements for preventive chemotherapy in ogun state, nigeria. Scientific Reports
12, 2027.
Moraga, P., Dean, C., Inoue, J., Morawiecki, P., Noureen, S.R., Wang, F., 2021. Bayesian spatial modelling of geostatistical data using inla and
spde methods: A case study predicting malaria risk in mozambique. Spatial and Spatio-temporal Epidemiology 39, 100440.
R Core Team, 2023. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria. URL:
https://www.R-project.org/.
Risser, M.D., 2016. Nonstationary spatial modeling, with emphasis on process convolution and covariate-driven approaches. arXiv preprint
arXiv:1610.02447 .
Sampson, P.D., Guttorp, P., 1992. Nonparametric estimation of nonstationary spatial covariance structure. Journal of the American Statistical
Association 87, 108–119.
Schmidt, A.M., Guttorp, P., O’Hagan, A., 2011. Considering covariates in the covariance structure of spatial processes. Environmetrics 22,
487–500.
Zhang, H., 2004. Inconsistent estimation and asymptotically equal interpolations in model-based geostatistics. Journal of the American Statistical
Association 99, 250–261.
11
