Functional Spatial Autoregressive Models
Tadao Hoshino∗
October 2, 2024
Abstract
This study introduces a novel spatial autoregressive model in which the dependent variable is
a function that may exhibit functional autocorrelation with the outcome functions of nearby units.
This model can be characterized as a simultaneous integral equation system, which, in general, does
not necessarily have a unique solution. For this issue, we provide a simple condition on the magnitude
of the spatial interaction to ensure the uniqueness in data realization. For estimation, to account for
the endogeneity caused by the spatial interaction, we propose a regularized two-stage least squares
estimator based on a basis approximation for the functional parameter. The asymptotic properties
of the estimator including the consistency and asymptotic normality are investigated under certain
conditions. Additionally, we propose a simple Wald-type test for detecting the presence of spatial
effects.
As an empirical illustration, we apply the proposed model and method to analyze age
distributions in Japanese cities.
∗School of Political Science and Economics, Waseda University, 1-6-1 Nishi-waseda, Shinjuku-ku, Tokyo 169-8050,
Japan. Email: thoshino@waseda.jp.
1
arXiv:2402.14763v3  [econ.EM]  1 Oct 2024

1
Introduction
Spatial interdependence among units is an essential element in spatial data analysis. To incorporate
spatial interactions into econometric analysis, researchers have extensively utilized the Spatial Auto-
Regressive (SAR) model:
yi “ α0
nÿ
j“1
wi,jyj ` xJ
i β0 ` εi,
(1.1)
where yi denotes a scalar outcome, wi,j denotes a known spatial weight between i and j, xi denotes a
vector of explanatory variables, and εi denotes an error term. The spatial lag term řn
j“1 wi,jyj captures
the spatial trend of the outcome variable in the neighborhood of i, and the scalar parameter α0 measures
its impact. The usefulness of SAR modelling (1.1) has been demonstrated in various empirical topics,
including regional economics, local politics, real estate, crimes, etc. In addition, if we define the weight
term wi,j based on social distance or friendship connections instead of geographic distance, then the
SAR models can be utilized to analyze social network data, and their applicability is vast.
To further broaden the applications of SAR modelling, this study aims to extend (1.1) to a functional
SAR model where the dependent variable is a function defined on a common closed interval:
qipsq “
nÿ
j“1
wi,j
ż 1
0
qjptqα0pt, sqdt ` xJ
i β0psq ` εipsq, for s P r0, 1s,
(1.2)
where qi : r0, 1s Ñ R denotes the outcome function of interest. Restricting the support to r0, 1s is a
normalization. In particular, for empirical relevance, this study primarily focuses on the case in which
qi is the quantile function for a scalar dependent variable of interest.
Regression models involving
functional variables have been widely studied in the literature of functional data analysis (FDA) for
several decades (e.g., Ramsay and Silverman, 2005). Our model is essentially different from the existing
ones in that we explicitly consider the simultaneous spatial interactions of the outcome functions.
As a motivating example, suppose we intend to investigate the impact of a regional childcare subsidy
program in a given city on the age distribution of the city. The policy is likely to attract households
with young children from other regions to benefit from the subsidy. Additionally, if childcare facilities
and schools need to be newly constructed, inflows of other age groups can also be anticipated as workers.
To obtain a comprehensive picture of the shift in the age distribution owing to the subsidy program
in its entirety, it would be natural to consider a regression model in which the dependent variable
represents the age distribution of each city, such as the quantile function. Meanwhile, when the size of
the young population in a given city is in an increasing trend (no matter the cause), which serves as a
driver of economic growth of the city, this might also lead to an influx of working-age population into
the surrounding regions owing to the spatial spillover of economic activities. The proposed functional
SAR model (1.2) is able to account for such interdependency between the outcome functions of nearby
spatial units.
In the literature, we are not the first to consider an SAR-type modelling in the functional regression
context. Zhu et al. (2022) proposed a social network model similar to ours in a time-series setting,
2

where the response variable is a function of time. They assumed that only concurrent interactions exist
at each moment such that the past and future outcomes of others do not affect the present outcome.
Consequently, when fixed at each time point, their model can be reduced to the standard SAR model in
(1.1). In this regard, our model may be considered to be a generalization of theirs such that α0pt, sq ‰ 0
is allowed for t ‰ s in general.
Another related modelling approach to ours is the SAR quantile regression (e.g., Su and Yang, 2011;
Malikov et al., 2019; Ando et al., 2023). When qi represents a quantile function, our model and theirs
are conceptually similar in that both approaches can examine the distributional effects of explanatory
variables on the outcome and the spatial interaction of outcomes in a unified framework. However, a
fundamental distinction lies in that we consider a model in which each unit has its own unique quantile
function as the dependent variable. Consequently, we can explicitly allow for each specific quantile
value of an outcome to interact with other quantiles of others’ outcomes. For instance, our model can
investigate the impacts of median outcome of neighborhoods on a specific (say) 10 percentile value
of own outcome. In the time-series context, Dong et al. (2024) consider the same type of interaction
structure as above.
Notice that our model (1.2) is characterized as a simultaneous integral equation system, and to the
best of our knowledge, this type of modelling has not been investigated in the econometrics literature.
To construct a consistent estimator for our model, the model space should be restricted such that
the realized qi’s are uniquely (in some sense) associated with the true parameters. We show that to
establish this uniqueness property, as in the standard SAR model (cf. Kelejian and Prucha, 2010),
the spatial effects α0 must be bounded within a certain range. In particular, we demonstrate that the
tightness of the bound required for α0 depends on the smoothness of the outcome function.
To estimate the model parameters, we need to address the endogeneity issue arising from the
simultaneous interaction among the outcome functions. Thus, we propose a regularized two-stage least
squares (2SLS) estimator that is based on a series approximation of α0p¨, sq at each evaluation point s.
Under the availability of a sufficient number of instrumental variables (IVs) and regularity conditions,
we prove that both the estimator for β0 and that for α0pt, sq are consistent at certain convergence
rates and asymptotically normally distributed. Additionally, we develop a Wald-type test for assessing
the presence of any spatial effects at each s. We show that the proposed test statistic asymptotically
distributes as the standard normal after appropriate normalization. Furthermore, we discuss performing
the estimation when the outcome functions are not fully observable on the entire interval r0, 1s, but are
only discretely observed, which is typical in most empirical situations. Our proposed estimator relies
on a simple interpolation method, and we derive a set of conditions under which the estimator can
achieve the same asymptotic properties as the infeasible counterpart.
As an empirical illustration, we investigate the determinants of age distribution in Japanese cities.
Since many Japanese cities are currently rapidly aging, which has emerged as one of the central social
problems in the country, understanding the mechanisms underlying the age structure of cities is crucial.
Using recent government survey data, including the Census, we apply our estimation and testing method
to 1883 Japanese cities.
Here, the outcome function qi represents the quantile function of the age
distribution in city i, and covariates xi include variables such as annual commercial sales, unemployment
rate, number of childcare facilities, and others. Our results suggest that spatial interaction effects are
3

extremely weak at quantiles close to the boundary points 0 or 1. This may not be surprising as all
individuals are born at age 0 and have a life expectancy of approximately 100 years at maximum,
resulting in little regional heterogeneity. In contrast, strong spatial effects are observed when both t
and s are at approximately the ages of young working population, possibly indicating that economic
activities and their spillovers are the main factors in shaping the spatial trend of age structure.
The remainder of this paper is organized as follows: In Section 2, we formally introduce the model
proposed in this study and discuss the condition under which it is well defined with a unique solution.
In addition, focusing on the cases where the outcome function is a quantile function, we discuss the
motivations and interpretation of such a modelling approach.
In Section 3, we describe our 2SLS
method for estimating β0 and α0. Thereafter, we study the asymptotic properties of the proposed
estimator under a set of assumptions. In this section, we also propose a test statistic for testing the
null hypothesis that α0pt, sq “ 0 for t P I, and its asymptotic distribution is derived.
In Section
4, we present the results of Monte Carlo experiments to evaluate the finite sample performance of
the proposed estimator and test. Section 5 presents our empirical analysis on the age distribution of
Japanese cities, and Section 6 concludes the paper.
Notation
For a natural number n, In denotes an n ˆ n identity matrix. For a function h defined
on r0, 1s, the Lp norm of h is written as ||h||Lp :“ p
ş1
0 |hpsq|pdsq1{p, and Lpp0, 1q denotes the set of h’s
such that ||h||Lp ă 8. For a random variable x, the Lp norm of x is written as ||x||p :“ pE|x|pq1{p.
For a matrix A, ||A|| and ||A||8 denote the Frobenius norm and the maximum absolute row sum of A,
respectively. If A is a square matrix, we use ρmaxpAq and ρminpAq to denote its largest and smallest
eigenvalues, respectively. In addition, A´ is a symmetric generalized inverse of A. We write a À b and
a Àp b if a “ Opbq and a “ OP pbq, respectively. Finally, we write a „ b when a À b and b À a.
2
Functional SAR Models
2.1
Model Setup and Completeness
Suppose that we have data of size n: tpqi, xi, wi,1, . . . , wi,nqun
i“1, where qi denotes a random outcome
function of interest with the common support r0, 1s, xi “ pxi,1, . . . , xi,dxqJ P Rdx denotes a vector of
covariates including a constant term, and wi,j P R denotes the pi, jq-th element of an nˆn pre-specified
spatial weight matrix Wn “ pwi,jqn
i,j“1. The value of each wi,j is determined non-randomly. As is
the convention, we set wi,i “ 0 for all i for normalization. Note that the spatial configurations of the
units generally change with the sample size. Thus, the variables generally form triangular arrays, and
model parameters depend on n through spatial interactions. However, when there is no confusion, the
dependence on n is suppressed for notational convenience.
As shown in (1.2), our working model is
qipsq “
ż 1
0
qiptqα0pt, sqdt ` xJ
i β0psq ` εipsq, for s P r0, 1s,
where qi denotes the spatial lag of the outcome function: qi :“ řn
j“1 wi,jqj. The unknown parameters
4

to be estimated are α0 and β0 “ pβ01, . . . , β0dxqJ. For instance, in our empirical analysis, qipsq denotes
the s-th quantile of the age distribution in city i, and α0pt, sq captures the impacts from the t-th
quantile ages of neighborhood cities to the s-th quantile age of own city. For other examples, qipsq
could be the s-th quantile of the income distribution in city i, s-th quantile of the daily activity energy
expenditure of person i, number of available bicycles at the bicycle-sharing station i at time s, and so
forth. Hereinafter, we assume that qi P Lpp0, 1q for some 2 ď p ă 8 and that α0 P Cr0, 1s2, where
Cr0, 1s2 denotes the set of continuous functions on r0, 1s2.
Before turning to the estimation of α0 and β0, we discuss the completeness of our model, that is,
whether model (1.2) can be characterized by a unique solution pq1, . . . , qnq. As our model comprises a
system of n functional equations, the existence and uniqueness of the solution are non-trivial problems.
If the system does not have or has multiple solutions, consistently estimating the model parameters
without some ad hoc assumptions is generally impossible.
Let Qpsq “ pq1psq, . . . , qnpsqqJ, X “ px1, . . . , xnqJ, and Epsq “ pε1psq, . . . , εnpsqqJ. Then, we can
re-write (1.2) in matrix form as
Qpsq “ Wn
ż 1
0
Qptqα0pt, sqdt ` Xβ0psq ` Epsq.
This expression suggests that our model is seen as a system of Fredholm integral equations of the second
kind with kernel α0pt, sq. Defining α0 :“ maxpt,sqPr0,1s2 |α0pt, sq|, whose existence is ensured under the
continuity of α0, assume the following:
Assumption 2.1. α0 À 1 and ||Wn||8 À 1 such that α0||Wn||8 ă 1.
Let us denote Hn,p :“ tH “ ph1, . . . , hnq : hi P Lpp0, 1q for all iu, and define a linear operator T as
pT Hqpsq :“ Wn
ż 1
0
Hptqα0pt, sqdt, for H P Hn,p,
whose range is Hn,p under Assumption 2.1. Thus, we can write Q “ T Q ` Xβ0 ` E. Then, denoting
Id to be the identity operator, if the inverse operator pId ´ T q´1 exists, the solution Q of the system
can be uniquely determined (as an element of Hn,p) as Q “ pId ´ T q´1rXβ0 ` Es.
The next proposition states that Assumption 2.1 is sufficient for the existence of pId ´ T q´1 and
uniqueness of Q.
Proposition 2.1. Suppose that Assumption 2.1 holds. Then, pId ´ T q´1 exists, and Q is the only
solution of (1.2) in the Banach space pHn,p, || ¨ ||8,pq, where ||H||8,p :“ max1ďiďn ||hi||Lp.
The proof is straightforward. Under Assumption 2.1, we have
}tT Hui}Lp “
›››››
nÿ
j“1
wi,j
ż 1
0
hjptqα0pt, ¨qdt
›››››
Lp
ď
nÿ
j“1
|wi,j|
ˆż 1
0
ˇˇˇˇ
ż 1
0
hjptqα0pt, sqdt
ˇˇˇˇ
p
ds
˙1{p
ď
nÿ
j“1
|wi,j|
ˆż 1
0
ż 1
0
|hjptq|p |α0pt, sq|p dtds
˙1{p
ď α0||Wn||8 max
1ďjďn ||hj||Lp ă ||H||8,p ă 8
(2.1)
5

for any H P Hn,p by Minkowski’s and Jensen’s inequalities. This implies that T H P Hn,p. As is well
known, if the operator norm of T is smaller than one, pId ´ T q´1 exists, and we have the Neumann
series expansion pId ´ T q´1 “ ř8
ℓ“0 T ℓconverging in the operator norm (e.g., Theorem 2.14, Kress
(2014)). It is immediate from (2.1) that }T H}8,p ă 1 follows for any H such that ||H||8,p “ 1, which
yields the desired result.
When the spatial weight matrix is row-normalized such that ||Wn||8 “ 1, as is often the case
in empirical applications, Assumption 2.1 can be reduced to α0 ă 1, which somewhat resembles the
solvability condition |α0| ă 1 for the standard linear SAR model (1.1).
Remark 2.1 (Alternative condition). If one imposes a stronger assumption on the space of the input
functions, the requirement for the kernel can be relaxed.
For example, for all i, suppose that qi
belongs to Cr0, 1s. Then, by the extreme value theorem, qi’s are bounded. Letting Hn,8 :“ tH “
ph1, . . . , hnq : hi P Cr0, 1s for all iu and ||H||8,8 :“ max1ďiďn maxsPr0,1s |hipsq|, we can easily show that
Q is the only solution in the Banach space pHn,8, || ¨ ||8,8q if ||Wn||8 maxsPr0,1s
ş1
0 |α0pt, sq|dt ă 1 is
satisfied.1 If the spatial weight matrix is row-normalized, then the condition can be further simplified
to maxsPr0,1s
ş1
0 |α0pt, sq|dt ă 1, which is a familiar requirement for the solvability of the Fredholm
integral equation of the second kind (e.g., Corollary 2.16, Kress (2014)). It is known that compactly
supported continuous functions are dense in Lp (1 ď p ă 8). Thus, in practice, assuming that all
qi’s are continuous is almost harmless, and hence the violation of Assumption 2.1 should be allowed to
some extent.
The Neumann series expansion implies that Q can be expressed as Q “ Xβ0 ` T Xβ0 ` T 2Xβ0 `
¨ ¨ ¨ ` E ` T E ` T 2E ` ¨ ¨ ¨ , that is,
Qp¨q “ Xβ0p¨q ` WnX
ż 1
0
β0ptqα0pt, ¨qdt ` WnWnX
ż 1
0
ż 1
0
β0pt1qα0pt1, t2qα0pt2, ¨qdt1dt2 ` ¨ ¨ ¨
Hence, the marginal effect of increasing xi,j on Qp¨q is obtained by
BQp¨q
Bxi,j
“ eiβ0jp¨q ` Wnei
ż 1
0
β0jptqα0pt, ¨qdt ` WnWnei
ż 1
0
ż 1
0
β0jpt1qα0pt1, t2qα0pt2, ¨qdt1dt2 ` ¨ ¨ ¨ ,
where ei denotes the i-th column of In. This clearly shows that a change in i’s covariate affects not
only the outcome of i but also those of other units through the spatial interaction - the so-called spatial
multiplier effect.
2.2
Leading Example: A Distributional SAR Model
One of the situations in which model (1.2) can be most nicely applied empirically would be when the
outcome function qi represents the quantile function for the cumulative distribution function (CDF) of
1Clearly, for any given H P Hn,8 such that ||H||8,8 “ 1, we have
|tpT Hqpsqui| ď
n
ÿ
j“1
|wi,j|
ż 1
0
|hjptq| ¨ |α0pt, sq|dt ď ||Wn||8 max
sPr0,1s
ż 1
0
|α0pt, sq|dt.
6

a variable of interest. In our empirical analysis, we study the determinants of the population pyramids
of Japanese cities by employing the age quantile function of city i as qi.
Suppose that for each i we can observe a random CDF Fi for an outcome variable y P Yi Ď R
of interest. The quantile function of y for i is defined as qipsq :“ infty P Yi : s ď Fipyqu. In the
FDA literature, models where the response variable represents a probability distribution have garnered
significant attention, for example, Petersen and M¨uller (2016), Han et al. (2020), Yang et al. (2020),
Yang (2020), Ghodrati and Panaretos (2022), Petersen et al. (2021), Chen et al. (2023). For an excellent
review on this topic, refer to Petersen et al. (2022). A common view in these studies is that performing
a regression analysis directly in the space of CDFs (or densities) is often problematic. Hence, we should
consider imposing a regression model on the quantile function (rather than on the CDF per se), as in
Yang et al. (2020) and Yang (2020), enabling us to enjoy several analytically and interpretationally
preferable properties as mentioned below.
First, quantile functions can be easily computed without considering the range boundaries, unlike
CDFs. Second, the domains of CDFs are typically heterogeneous across individuals, whereas that of
quantile functions is always the fixed interval r0, 1s. Third, the least-squares regression of the quantile
function can be nicely interpreted as a Wasserstein distance minimization problem.
More specifically for the third point, denoting F α,β
i
to be the CDF induced from the quantile
function qα,β
i
psq :“
ş1
0 qiptqαpt, sqdt ` xJ
i βpsq, the squared 2-Wasserstein distance between Fi and F α,β
i
is obtained as2
W2
2pFi, F α,β
i
q “
ż 1
0
´
qipsq ´ qα,β
i
psq
¯2
ds.
Thus, minimizing the mean squared Wasserstein distance with respect to pα, βq: minα,β n´1 řn
i“1 W2
2pFi, F α,β
i
q
is equivalent to performing a functional least squares regression based on model (1.2). Note, however,
that the resulting least squares estimator does not produce a consistent estimate of pα0, β0q because of
the endogeneity of qi. To circumvent the endogeneity issue, we introduce a penalized 2SLS method in
the next section.
It is also worth noting that the spatially lagged quantile qi corresponds to the quantile function of the
spatially weighted Fr´echet mean F i of tF1, . . . , Fnu at the location of i: F i :“ arg minF
řn
j“1 wi,jW2
2pF, Fjq,
which is also referred to as the Wasserstein barycenter, with wi,j ě 0 and řn
j“1 wi,j “ 1. Rather than
using qi, one might consider employing the quantile function of the spatially lagged CDF: řn
j“1 wi,jFi as
the spatial trend term. However, a linear mixture of CDFs is generally multimodal and does not inherit
the shape properties of the original CDFs. In this regard, the weighted Fr´echet mean should be more
representative and faithful as an indicator of the neighborhood trend. This point is also highlighted in
Gunsilius (2023) in the context of synthetic control analysis.
2Formally, the Wasserstein distance is a distance between two probability measures. We abuse the notation using
CDFs in its arguments for ease of explanation. For more precise discussions on the properties of Wasserstein distance, see
Panaretos and Zemel (2020), for instance.
7

3
Estimation and Asymptotics
3.1
Penalized 2SLS estimator
We now discuss the estimation of α0 and β0. Let tϕk : k “ 1, 2, . . .u be a series of basis functions, such
as Fourier series, B-splines, and wavelets, such that we can expand α0pt, sq “ ř8
k“1 ϕkptqθ0kpsq for each
s. Then, we have
ş1
0 qiptqα0pt, sqdt “ ř8
k“1 ri,kθ0kpsq, where ri,k :“
ş1
0 qiptqϕkptqdt. Hence, our model
(1.2) can be re-written as
qipsq “
K
ÿ
k“1
ri,kθ0kpsq ` xJ
i β0psq ` εipsq ` uipsq
where ri,k :“ řn
j“1 wi,jrj,k, uipsq :“
ş1
0 qiptqα0pt, sqdt ´ řK
k“1 ri,kθ0kpsq, and K ” Kn is a sequence of
integers tending to infinity as n grows. Note that this is just a multiple regression model having K
endogenous regressors ri “ pri,1, . . . , ri,KqJ with a composite error term εipsq ` uipsq. Thus, we can
resort to the 2SLS approach to estimate θ0psq “ pθ01psq, . . . , θ0KpsqqJ and β0psq under the availability
of a sufficient number of valid IVs for ri.
Suppose that we have an Lˆ1 vector z1,i of IVs that are correlated with ri but not with εi such that
L ” Ln ě Kn. The choice of IVs will be discussed later. Further, let zi “ pzJ
1,i, xJ
i qJ, Z “ pz1, . . . , znqJ,
R “ pr1, . . . , rnqJ, Mz “ ZpZJZq´ZJ, Mx “ XpXJXq´1XJ, and Rx “ pIn ´ MxqR. Then, our 2SLS
estimator is defined as follows:
pβnpsq :“
“
XJpIn ´ SqX
‰´1 XJpIn ´ SqQpsq
pθnpsq :“
”
R
J
x MzRx ` λDn
ı´1
R
J
x MzQpsq
(3.1)
for a given evaluation point s P p0, 1q, where S “ MzRrR
JMzRs´R
JMz, λ ” λn is a non-negative
regularization parameter tending to zero as n increases, and D denotes a K-dimensional matrix,
which is positive semidefinite, symmetric, and satisfies ρmaxpDq À 1 uniformly in K. Once pθnpsq “
ppθn1psq, . . . , pθnKpsqqJ is obtained, we can estimate α0p¨, sq by
pαnp¨, sq :“
K
ÿ
k“1
ϕkp¨qpθnkpsq.
To recover the entire functional form of α0p¨, ¨q and β0p¨q, we can repeat the described estimation
procedure over a sufficiently fine grid on r0, 1s.
Remark 3.1 (Choice of instruments). Observing that qipsq «
ş1
0 qiptqα0pt, sqdt`xJ
i β0psq, where qiptq :“
řn
j“1 wi,jqjptq, and xi :“ řn
j“1 wi,jxj, the spatially lagged covariates xi would be natural IV candidates
for ri,k “
ş1
0 qiptqϕkptqdt, assuming that β0 ‰ 0. For identification, the number of valid IVs must be
larger than or equal to K. While it is theoretically required that K tends to infinity as n increases to
consistently estimate α0p¨, sq, the dimension of xi, dx, is fixed in our model. Note that, as long as both
α0 and β0 are non-degenerate, it is possible to create arbitrarily many IVs by taking the spatial lags
of xi of higher and higher order: xi, xi, ... and so forth. However, the higher the order, the weaker the
8

instruments. Since K is at most less than eight or so for most practical sample sizes, we believe that
finding sufficient IVs may not be a serious concern in most empirical situations where researchers can
collect a reasonable number of independent variables. See Remark 3.2 below for a related discussion.
In practice, the 2SLS estimator in (3.1) would be rarely feasible because function qi can usually
only be incompletely observed. For instance, we might only be able to observe the values of qi at finite
points, qipsi,1q, . . . , qipsi,miq. This is the case of our empirical analysis of the age distribution in Japanese
cities. In this empirical analysis, we cannot access the complete age distribution for each city, but we
only know the distribution up to every five-year age interval. In such a case, for example, we can apply
a linear interpolation method to obtain an approximation of the entire functional form of qi. Without
loss of generality, suppose the observations are ordered in an increasing way: si,1 ď si,2 ď ¨ ¨ ¨ ď si,mi.
Then, for each given s P rsi,l, si,l`1s, we estimate qipsq by
pqipsq “ ωipsqqipsi,lq ` p1 ´ ωipsqqqipsi,l`1q,
(3.2)
where ωipsq “ psi,l`1 ´ sq{psi,l`1 ´ si,lq. When s ă si,1 (resp. s ą si,mi), we can set pqipsq “ qpsi,1q
(resp. pqipsq “ qpsi,miq).
When qi is a quantile function, it is also typical that a finite sample tyi,1, . . . , yi,miu randomly drawn
from Fi is only available. In this case, a straightforward approach to estimate qi would be to perform a
nonparametric kernel CDF estimation and invert the estimate. Alternatively, we can also use a simple
interpolation method as described in Yang (2020).
Letting pqi be any estimator of qi, compute pri,k :“ řn
j“1 wi,j
ş1
0 pqjptqϕkptqdt and let pri “ ppri,1, . . . , pri,KqJ.
Now, the feasible version of (3.1) is defined as
rβnpsq :“
”
XJpIn ´ pSqX
ı´1
XJpIn ´ pSq pQpsq
rθnpsq :“
”
pRJ
x Mz pRx ` λDn
ı´1 pRJ
x Mz pQpsq
where pQpsq “ ppq1psq, . . . , pqnpsqqJ, pR “ ppr1, . . . , prnqJ, pRx “ pIn´Mxq pR, and pS “ Mz pRr pRJMz pRs´ pRJMz.
The estimator for α0p¨, sq can be obtained by rαnp¨, sq :“ řK
k“1 ϕkp¨qrθnkpsq.
3.2
Convergence rates and limiting distributions
To derive the asymptotic properties of our estimators, we first need to specify the structure of our
sampling space. Following Jenish and Prucha (2012), let D Ă Rd, 1 ď d ă 8 be a possibly uneven
lattice, and Dn Ă D be the set of observation locations, which may differ across different n. For spatial
data, D would be defined by a geographical space with d “ 2. Notably, D does not necessarily have to
be exactly observable to us. For example, D is possibly a complex space of general social and economic
characteristics. In this case, we can consider it to be an embedding of individuals in a latent space,
instead of their physical locations.
Assumption 3.1. (i) The maximum coordinate difference between any two observations i, j P D,
which we denote as ∆pi, jq, is at least (without loss of generality) 1; and (ii) a threshold distance ∆
exists such that wi,j “ 0 if ∆pi, jq ą ∆.
9

Assumptions 3.1(i) and (ii) together imply that the number of interacting neighbors for each unit
is bounded. We believe this is not too restrictive in practice.
Assumption 3.2. (i) tziun
i“1 are non-stochastic and uniformly bounded; and (ii) limnÑ8 ZJZ{n exists
and is nonsingular.
Assumption 3.3. (i) For all i, εi P Lpp0, 1q for some 2 ď p ă 8; (ii) tεiun
i“1 are independent; and (iii)
Erεipsqs “ 0 for all i, inf1ďiďn; ně1 ||εipsq||2 ą 0, and sup1ďiďn; ně1 ||εipsq||4 À 1.
Assumption 3.2(i) states that the covariates and instruments are constant.
The same type of
assumption as this has been often utilized in the literatures on spatial econometrics and many-IV
estimation (e.g., Kelejian and Prucha, 2010; Hausman et al., 2012).
Note that this assumption is
essentially equivalent to considering all stochastic arguments as being conditional on tziun
i“1. Assump-
tion 3.3 restricts the distribution of the error functions, which accommodates virtually any form of
heteroscedasticity. We might be able to relax the independence assumption in (ii) to some weak depen-
dence condition, but we introduce this for technical simplicity. The s in (iii) is a given interior point of
r0, 1s at which the estimation is performed.
Assumption 3.4. (i) For all k, ϕk P L2p0, 1q; (ii) ||α0p¨, sq ´ ϕKp¨qJθ0psq||L2 ď ℓKpsq, where ϕK “
pϕ1, . . . , ϕKqJ; and (iii) ρmaxp
ş1
0 ϕKptqϕKptqJdtq À 1.
Assumption 3.4 imposes a set of conditions on the basis functions. The L2-convergence rate of
the approximation errors for various bases is discussed in Belloni et al. (2015), where it is shown that
ℓKpsq À K´π typically holds when α0p¨, sq is a π-smooth function (i.e., H¨older class of smoothness order
π).
Assumption 3.5. (i) ρmaxpErR
JZ{nsErZJR{nsq, ρmaxpErR
J
x Z{nsErZJRx{nsq À 1; and (ii) there ex-
ists νKL ą 0 such that νKL ď lim infnÑ8 ρminpErR
JZ{nsErZJR{nsq, lim infnÑ8 ρminpErR
J
x Z{nsErZJRx{nsq.
Remark 3.2 (Potentially weak identification of α0). The νKL in Assumption 3.5(ii) governs the
strength of the identification of α0, conceptually equivalent to the issue of ill-posedness estimation
in high-dimensional IV regression models (Breunig et al., 2020). It is important to note that, the ill-
posedness problem in our context is a more practical concern, unlike the intrinsically ill-posed nature of
nonparametric IV models (e.g., Blundell et al., 2007; Hoshino, 2022). As mentioned in Remark 3.1, our
model assumes only a finite number of exogenous variables (i.e., xi), while the number of endogenous
variables grows to infinity. One potential strategy for constructing a sufficient number of IVs is to
use higher-order spatial lags of xi. However, as the order of spatial lags increases, their correlation
with the endogenous variables inevitably gets weaker, and the IVs themselves typically become more
collinear. This results in the ill-posedness problem, slowing down the rate of convergence, and inflating
the variance of our estimator. A similar discussion can be found in Tchuente (2019). We introduce the
penalty term λD to control the variance inflation by restricting the flexibility of the estimated function.
To state the next assumption, we define the following matrices: Vnpsq :“ diagtErε2
1psqs, . . . , Erε2
npsqsu,
10

Ωn,xpsq :“ ΨJ
n,xVnpsqΨn,x{n,
Ψn,x :“ X ´ ZpZJZ{nq´EpZJR{nq
”
ER
JMzER{n
ı´1
EpR
JX{nq,
Σn,x :“ XJX{n ´ EpXJR{nq
”
ER
JMzER{n
ı´1
EpR
JX{nq.
Assumption 3.6. Σx :“ limnÑ8 Σn,x and Ωxpsq :“ limnÑ8 Ωn,xpsq exist and are nonsingular.
The next theorem gives the convergence rate of our estimator.
Theorem 3.1. Suppose Assumptions 2.1 and 3.1 – 3.6 hold. In addition, assume L
?
K{pν2
KL
?nq À 1.
Then, we have
(i) ||pβnpsq ´ β0psq|| Àp n´1{2, and (ii) }pαnp¨, sq ´ α0p¨, sq}L2 Àp
?
K{?n ` ℓKpsq
?νKL ` λρD
` λ||θ0psq||D
νKL ` λρD
,
where ρD :“ ρminpDq, and ||θ0psq||D :“
a
θ0psqJDθ0psq.
The proofs of Theorem 3.1 and those presented below are somewhat similar in several parts to
those in Hoshino (2022), but for completeness, they are all presented in Appendix A. Theorem 3.1(i)
shows that the coefficients of xi can be estimated at the root-n rate. Meanwhile, result (ii) indicates
that the L2-convergence rate of pαnp¨, sq is not standard owing to the potential weak identification and
the presence of the penalty term λD. We can observe a trade-off that the first term converges to
zero quickly by selecting a large λ, while the second term can vanish if we select λ diminishing at a
sufficiently fast rate such that νKL{λ Ñ 8. It is clear that the order of ||θ0psq||D is bounded by
?
K.
When θ0psq is a sparse vector or it is decaying in the order of basis expansion, ||θ0psq||D À 1 might be
possible.
Next, define σn,λpt, sq :“
b
ϕKptqJΣ´1
n,r,λΩn,rpsqΣ´1
n,r,λϕKptq, Σn,r,λ :“ ER
J
x MzERx{n ` λD, and
Ωn,rpsq :“ ER
J
x MzVnpsqMzERx{n. Moreover, let
p
Cnpsq :“
“
XJpIn ´ SqX{n
‰´1 ´
XJpIn ´ Sq pVnpsqpIn ´ SqX{n
¯ “
XJpIn ´ SqX{n
‰´1
rpσn,λpt, sqs2 :“ ϕKptqJ ”
R
J
x MzRx{n ` λD
ı´1 ´
R
J
x Mz pVnpsqMzRx{n
¯ ”
R
J
x MzRx{n ` λD
ı´1
ϕKptq
where pVnpsq :“ diagtpε2
1psq, . . . , pε2
npsqu, and pεipsq :“ qipsq ´ rJ
i qθnpsq ´ xJ
i pβnpsq, where qθnpsq denotes the
estimator of θ0psq obtained following (3.1) with λ set to zero. Then, the limiting distribution of our
estimator can be characterized as in the following theorem.
Theorem 3.2. Suppose Assumptions 2.1 and 3.1 – 3.6 hold. In addition, assume
K „ L,
K3{pν4
KLnq Ñ 0,
?nℓKpsq{?νKL Ñ 0,
?n|ϕKptqJθ0psq ´ α0pt, sq|{||ϕKptq|| Ñ 0,
λ{ν2
KL Ñ 0,
?nλ||θ0psq||D{νKL Ñ 0.
11

Then, we have
(i) ?nppβnpsq ´ β0psqq dÑ Np0, Σ´1
x ΩxpsqΣ´1
x q, (ii)
?nppαnpt, sq ´ α0pt, sqq
σn,λpt, sq
dÑ Np0, 1q,
(iii)
››› p
Cnpsq ´ Σ´1
x ΩxpsqΣ´1
x
››› “ oP p1q, and (iv) |pσn,λpt, sq ´ σn,λpt, sq| “ oP p1q.
Remark 3.3 (Choice of tuning parameters). To implement our estimator, we need to select three
tuning parameters λ, K, and L. For the penalty parameter λ, considering the assumptions in Theorem
3.2, it must converge to zero faster at least than n´1{2. In the numerical studies presented below, we
set λ „ n´3{5. For the order of basis expansion K, assume that L „ K „ nk for some k ą 0. We
further assume that the ill-posedness is mild such that νKL À K´ν for some ν ą 0 and suppose that
α0p¨, sq is a π-smooth function such that ℓKpsq À K´π. Then, easy calculations yield that K must
satisfy 1{p2π ´ νq ă k ă 1{p3 ` 4νq to ensure the asymptotic normality results. This clearly indicates
that when the IVs are not strong, a modest K should be employed. In Section 4, we numerically
examine the impact of tuning parameters selection. The results demonstrate that the choice of λ is
more influential on the estimation performance than that of K. More sophisticated, data-driven tuning
parameter choice methods will be investigated in future studies.
3.3
Testing the presence of spatial effects
In this section, we consider statistically testing the presence of spatial effects. Specifically, for each
given s, we test the following null hypothesis:
H0 : α0pt, sq “ 0 almost everywhere t P I
where I denotes a non-degenerate sub-interval of r0, 1s. Then, a natural test statistic for testing H0
would be the Wald-type statistic given as follows:
Tn :“ n
ż
I
pα2
npt, sqdt,
where the dependence of Tn on s is suppressed. To derive the asymptotic distribution of Tn under H0,
let Ξn :“ Σ´1
n,r,λEpR
J
x Z{nqpZJZ{nq´ and ΦI :“
ş
I ϕKptqϕKptqJdt. Further, define
µn :“ tr
␣
ΞJ
nΦIΞnpZJVnpsqZ{nq
(
vn :“ 2tr
␣
ΞJ
nΦIΞnpZJVnpsqZ{nqΞJ
nΦIΞnpZJVnpsqZ{nq
(
,
which serve as the mean and variance of Tn, respectively.
Here, we introduce the following miscellaneous assumptions.
Assumption 3.7. (i) sup1ďiďn; ně1 ||εipsq||6 À 1; and (ii) 0 ă ρminpΦIq ď ρmaxpΦIq À 1.
The next theorem characterizes the asymptotic distribution of our test statistic.
Theorem 3.3. Suppose Assumption 3.7 and the assumptions in Theorem 3.2 are all satisfied.
In
addition, assume 1{pKν2
KLq Ñ 0 and K3{pν5
KLnq Ñ 0. Then, we have pTn ´ µnq{?vn
dÑ Np0, 1q.
12

When H0 does not hold, the standardized test statistic pTn ´ µnq{?vn deviates to a positive value.
Thus, considering Theorem 3.3, we can reject H0 at the 100α% significance level if the realized value
of pTn ´ µnq{?vn exceeds the upper α-quantile of Np0, 1q. To implement the test in practice, we need
to consistently estimate µn and vn, which can be easily performed by the sample analogue estimators,
the definitions of which should be clear from the context.
The consistency of these estimators is
straightforward (refer to Lemmas A.3 and A.4 and Theorem 3.2(iii), (iv)).
Remark 3.4. The proposed test can easily be extended to a more general null hypothesis: H0 :
α0pt, sq “ aptq for t P I, where ap¨q is any given function that is pre-specified by the researcher (or
estimable with a certain convergence rate). The resulting test statistic would take the following form:
Tn “
ş
Ippαnpt, sq ´ aptqq2dt, and H0 can be tested using the same procedure as above.
Finally, it is important to notice that when H0 : α0pt, sq “ 0 is indeed true over the entire r0, 1s,
higher-order spatially-lagged covariates are not valid IVs, that is, for example, xi and qi are not related
to each other. Thus, basically, we need to prepare a sufficient number of IVs using only xi and possibly
its transformations in this case.
3.4
Asymptotic properties under interpolated outcome functions
Finally, in this section, we examine the cases in which the outcome functions are only discretely ob-
served, and they are linearly interpolated following (3.2). Letting si,0 “ 0 and si,mi`1 “ 1 for all i, we
introduce the following assumption.
Assumption 3.8. For all i, (i) there exists a positive sequence κ ” κn tending to zero as n increases
such that |si,l`1 ´ si,l| À κ, for all l “ 0, 1, . . . , mi; and (ii) there exists a constant ξ P p0, 1s such that
|qips1q ´ qips2q| À |s1 ´ s2|ξ for any s1, s2 P r0, 1s.
Assumption 3.8(i) determines the overall precision of the linear interpolation approximation. For
simplicity of discussion, it assumes that the values of the outcome function are (quasi) uniformly
observed such that the distance of any two consecutive observations is of order κ. In addition, note that
we treat each observation point as nonstochastic. Assumption 3.8(ii) requires that the outcome function
is H¨older continuous with exponent ξ for all i. This assumption may be somewhat restrictive, but similar
assumptions are often considered in the FDA literature (e.g., Crambes et al., 2009). Obviously, we need
some form of continuity in order for the interpolation approximation to work.
The following theorem states that the approximation errors caused by the linear interpolation are
asymptotically negligible if κξ is sufficiently small.
Theorem 3.4. Suppose Assumption 3.8 and those in Theorem 3.2 are all satisfied.
In addition,
assume ?nκξ{?νKL Ñ 0. Then, rβnpsq and rαnp¨, sq are asymptotically equivalent to pβnpsq and pαnp¨, sq,
respectively.
Under Assumption 3.8, the approximation error |pqipsq ´ qipsq| is of order κξ uniformly in s. The
condition ?nκξ{?νKL Ñ 0 states that the interpolation error should shrink to zero faster than n´1{2,
similar to the basis approximation error ℓKpsq. From this result, it is also straightforward to observe
the asymptotic equivalence between the feasible Wald test rTn :“ n
ş
I rα2
npt, sqdt and Tn presented in the
previous section.
13

4
Numerical Experiments
Performance of the 2SLS estimator
In this section, we first examine the finite sample performance
of the proposed 2SLS estimator. We consider the following three data-generating processes (DGPs) for
the Monte Carlo experiments:
qipsq “
ż 1
0
qiptqα0pt, sqdt `
7ÿ
j“1
xi,jβ0jpsq ` εipsq,
where
DGP 1: α0pt, sq “ pt ` sq{2
DGP 2: α0pt, sq “ PDF of Npt ´ s, 0.72q
DGP 3: α0pt, sq “ 0.3 ` 0.7t sinp2πpt ´ sqq
β0jpsq “ 1 ` 1.2 logps ` 1q for j “ 1, 2, 3, β0jpsq “ exppsq ´ 0.4 for j “ 4, . . . , 7, xi,j
IID
„ Np0, 1q for
all j, and εipsq “ ε1,i ` ř4
j“1 sj{2ε2,i,j with ε1,i
IID
„ Np0, 0.32q and ε2,i,j
IID
„ Np0, 0.62q for all j. When
estimating the model, an intercept term is also included. We randomly allocate n units on the lattice
of n{20 ˆ 40, where we consider two sample sizes: n P t400, 1600u. The spatial weight matrix Wn
is defined according to the Rook contiguity with row normalization. Since these three DGPs satisfy
the requirements in Assumption 2.1, we can generate the outcome functions Q using the Neumann
series approximation: Q « QpLq :“ řL
ℓ“0 T ℓrXβ0 ` Es, where L is increased until max1ďiďn |qpLq
i
psq ´
qpL´1q
i
psq| ă 0.001 is met for all s. For computing the integrals over r0, 1s, we approximate them by
finite summations over 199 grid points: 0.005, 0.010, . . . , 0.995.
For the choice of the basis functions tϕku, we use the cubic B-splines. We examine two values
for the number of the inner knots of the B-splines: # knots P t2, 3u, corresponding to K “ 6 and 7,
respectively, both of which are equally spaced in r0, 1s. The IVs used are the first- and second-order
spatial lags of t1, xi,1, . . . , xi,7u. Note that because there may exist some units that have no neighboring
units, the spatial lags of 1 are not necessarily constants. For the penalty term λD, we set D “ IK (i.e.,
the ridge penalty) and attempt using four values for λ “ λcn´3{5 with λc P t0.5, 1, 2, 3u. The number
of Monte Carlo repetitions for each setup is set to 1000. Throughout, the evaluation point s is fixed at
s “ 0.5.
The performance of the coefficient estimator pβn is evaluated using the average bias (BIAS) and the
average root mean squared error (RMSE):
BIAS: 1
7
7ÿ
j“1
«
1
1000
1000
ÿ
r“1
pβprq
nj psq ´ β0jpsq
ff
,
RMSE: 1
7
7ÿ
j“1
«
1
1000
1000
ÿ
r“1
ppβprq
nj psq ´ β0jpsqq2
ff1{2
,
where superscript prq means that the estimate is obtained from the r-th replicated dataset. Similarly,
for the estimator pαn of the spatial effect, we evaluate the performance based on the BIAS and RMSE
14

averaged over the 19 evaluation points tt1, t2, . . . , t19u equally spaced on r0, 1s:
BIAS: 1
19
19
ÿ
j“1
«
1
1000
1000
ÿ
r“1
pαprq
n ptj, sq ´ α0ptj, sq
ff
,
RMSE: 1
19
19
ÿ
j“1
«
1
1000
1000
ÿ
r“1
ppαprq
n ptj, sq ´ α0ptj, sqq2
ff1{2
.
Table 1 summarizes the simulation results. Our main findings are as follows: First, the results
suggest that our estimator works satisfactorily well for all scenarios. The RMSE values for estimating
β0 are approximately halved when the sample size is increased from 400 to 1600, which is consistent
with our theorem. Meanwhile, the RMSE values for estimating α0 do not decrease significantly even
when the sample size is increased. This result would be owing to the increased variances caused by
employing a smaller penalty parameter λ (recall that λ „ n´3{5). When comparing the results of the
estimators with different λ values, our results suggest that when the functional form of the spatial
effect α0 is simple as in DGPs 1 and 2, using an estimator with a relatively large penalty is advisable
in terms of RMSE. In contrast, when the functional form of α0 is complex as in DGP 3, the estimator
with the smallest penalty outperforms the others, which should be a reasonable result. It seems that
the number of inner knots has only minute impacts on the estimation performance.
Table 1: Estimation performance
β
α (λc “ 0.5)
α (λc “ 1)
α (λc “ 2)
α (λc “ 3)
DGP
n
# knots
BIAS
RMSE
BIAS
RMSE
BIAS
RMSE
BIAS
RMSE
BIAS
RMSE
1
400
2
-0.0010
0.0361
0.0201
0.1059
0.0213
0.1024
0.0188
0.1009
0.0147
0.0996
3
-0.0010
0.0362
0.0207
0.1186
0.0216
0.1146
0.0183
0.1124
0.0135
0.1108
1600
2
-0.0004
0.0178
0.0151
0.0957
0.0189
0.0950
0.0209
0.0967
0.0207
0.0979
3
-0.0004
0.0178
0.0160
0.1100
0.0196
0.1085
0.0213
0.1093
0.0207
0.1101
2
400
2
-0.0017
0.0365
-0.0010
0.0922
-0.0054
0.0836
-0.0124
0.0776
-0.0187
0.0748
3
-0.0018
0.0366
-0.0010
0.1079
-0.0057
0.0997
-0.0133
0.0937
-0.0202
0.0908
1600
2
-0.0006
0.0179
0.0014
0.0852
-0.0011
0.0814
-0.0048
0.0781
-0.0080
0.0764
3
-0.0006
0.0179
0.0015
0.1018
-0.0010
0.0981
-0.0050
0.0948
-0.0084
0.0931
3
400
2
-0.0017
0.0394
0.0148
0.1821
0.0177
0.1944
0.0155
0.2060
0.0107
0.2111
3
-0.0017
0.0394
0.0154
0.1850
0.0177
0.1991
0.0144
0.2115
0.0088
0.2165
1600
2
-0.0006
0.0192
0.0077
0.1544
0.0133
0.1680
0.0172
0.1860
0.0175
0.1955
3
-0.0006
0.0192
0.0087
0.1531
0.0140
0.1705
0.0173
0.1906
0.0172
0.2008
Performance of the Wald test
Next, we assess the finite sample performance of our test for the
presence of spatial effects. In this analysis, we use the same DGP as given above to generate the data,
with a slight modification on α0 in DGP 2. Specifically,
α0pt, sq “ ϱ ˆ PDF of Npt ´ s, 0.72q,
where ϱ P t0, 0.1, 0.2u. The null hypothesis to be tested is H0 : α0pt, 0.5q “ 0 for t P r0.1, 0.9s. Thus,
H0 holds true when ϱ “ 0.
In Table 2, we present the rejection frequency over 1000 Monte Carlo repetitions at the 10%, 5%,
and 1% significance levels. The results for ϱ “ 0 demonstrate that the size of our test is reasonably
15

well-controlled, with at most 1–2% deviation from the nominal levels for most cases. When the spatial
effect is mild in magnitude (ϱ “ 0.1), the estimator with a smaller penalty (λc “ 0.5) is not sufficiently
powerful to detect the effect probably owing to its large estimation variance. However, as expected, the
power of the test can be significantly improved by increasing the sample size. In the case of a stronger
spatial effect (ϱ “ 0.2), all tests exhibit nearly perfect power property for all sample sizes.
Table 2: Rejection frequency
ϱ “ 0
ϱ “ 0.1
ϱ “ 0.2
# knots
n
λc
10%
5%
1%
10%
5%
1%
10%
5%
1%
2
400
0.5
0.085
0.048
0.023
0.315
0.231
0.128
0.985
0.974
0.900
1
0.080
0.054
0.021
0.762
0.686
0.497
1.000
1.000
0.998
2
0.074
0.047
0.018
0.971
0.956
0.908
1.000
1.000
1.000
3
0.068
0.047
0.022
0.982
0.977
0.959
1.000
1.000
1.000
1600
0.5
0.081
0.058
0.030
0.642
0.474
0.260
1.000
1.000
1.000
1
0.084
0.056
0.029
0.980
0.951
0.811
1.000
1.000
1.000
2
0.087
0.057
0.029
1.000
1.000
1.000
1.000
1.000
1.000
3
0.086
0.056
0.027
1.000
1.000
1.000
1.000
1.000
1.000
3
400
0.5
0.085
0.048
0.023
0.322
0.236
0.129
0.986
0.976
0.912
1
0.080
0.053
0.021
0.782
0.695
0.516
1.000
1.000
0.998
2
0.077
0.046
0.017
0.972
0.957
0.915
1.000
1.000
1.000
3
0.069
0.047
0.023
0.982
0.979
0.962
1.000
1.000
1.000
1600
0.5
0.077
0.054
0.028
0.624
0.457
0.243
1.000
1.000
1.000
1
0.081
0.055
0.028
0.979
0.952
0.817
1.000
1.000
1.000
2
0.085
0.055
0.028
1.000
1.000
1.000
1.000
1.000
1.000
3
0.081
0.053
0.027
1.000
1.000
1.000
1.000
1.000
1.000
Simulations under discretely observed outcome functions
Finally, we evaluate the perfor-
mance of our estimator and test when the entire shapes of the outcome functions are not perfectly
observed but their values are discretely observable at finite points. The DGPs investigated here are
identical to those used previously. To recover the entire functional form of the outcome function for
each unit, we use the linear interpolation method in (3.2). For all units, we assume that m pairs of
points tpsi,j, qipsi,jqqum
j“1 are observable, where si,j’s are uniformly randomly drawn from r0, 1s, and m
is selected from two values m P t15, 50u.
To save space, the simulation results are omitted here and provided in Tables B1 and B2 in Appendix
B. From these tables, we can observe similar overall tendencies as those shown above. An interesting
finding is that, although increasing m from 15 to 50 improves the RMSE for most cases, there are some
situations in which the estimator with a smaller m achieves an even slightly better RMSE. Similarly,
comparing the results when m “ 15 with those when the outcome function is fully observable (those
reported in Table 1), the former occasionally exhibits smaller RMSE values. We conjecture that these
phenomena occurred because the linear interpolation “smoothed out” the original, potentially noisier,
outcome function, leading to a reduction in estimation variance. A similar discussion can be found in
Imaizumi and Kato (2018) in a different but related context. In contrast, regarding the size property
of the Wald test, the linear interpolation seems to introduce certain distortions. Unsurprisingly, these
distortions can be somewhat mitigated if m is large.
Except when λc “ 0.5, the test exhibits a
satisfactory power for both values of m.
16

5
An Empirical Illustration: Age Distribution of Japanese Cities
In this section, we apply the proposed estimator and test to analyze the determinants of the age distri-
bution of Japanese cities. While this type of data has been regularly studied in the FDA literature (e.g.,
Delicado, 2011; Hron et al., 2016; Bigot et al., 2017), there are few papers attempting a regression-based
analysis. In recent decades, many rural Japanese cities have been facing a serious aging population,
prompting them to plan campaigns to encourage young people from urban areas to settle in their cities.
Thus, investigating the relationship between the regional socioeconomic characteristics and the age
structure and the impact of neighborhood trend on it would be meaningful.
Our sample comprises all local municipalities (Shi-ku-cho-son) in Japan. The age distribution data
for each city are taken from the 2020 Census. For the covariates to explain the age distribution, we use
the ratio of agricultural, forestry, and fishery workers, number of hospital beds per capita, number of
childcare facilities per capita, unemployment rate, logarithm of annual commercial sales, and logarithm
of average residential landprice. All variables are as of the most recent year before 2020, and they
are all publicly available.3 In addition to these, we include five regional dummies.4 After excluding
the observations with missing items, the analysis is performed on 1883 municipalities. Table C3 in
Appendix C summarizes the detailed definitions of the variables used and their basic statistics.
Our age distribution data are not complete; we only have information on the population size at five-
year intervals (0 – 4 years old, 5 – 9 years old, and so forth). Therefore, when computing the quantile
function for each city, we performed the linear interpolation as in (3.2). In Figure 1, we depict the
obtained quantile functions for 20 randomly selected cities from our dataset. The figure clearly shows
the existence of certain regional heterogeneity in age compositions except those close to the boundary
points.
For estimation, we follow the same procedure as in the previous section with K “ 7 (three inner
knots) and λ “ 3n´3{5. The integrals are replaced by summations over 399 equally-spaced grid points
on r0, 1s. For the spatial weight, expecting that the impacts from demographic changes in large cities
should be larger than those from small cities, we consider the following specification:
wi,j “
1ti and j are adjacentuaPopulationj
ř
j‰i 1ti and j are adjacentuaPopulationj
.
When city i has no neighbors (e.g., islands), we set wi,j “ 0 for all j. The estimation is performed on
nine quantile values: s “ 0.1, 0.2, . . . , 0.9.
To save space, the estimated coefficients β0psq are presented in Figure C1 in Appendix C. Our major
findings from the figure are as follows: Interestingly, for all variables, the impacts on age distribution
become prominent around the median (s “ 0.5), suggesting the residential flexibility of this age group in
response to the socioeconomic conditions of a city. The variables considered as indicators of urbanness,
such as the commercial sales and the landprice, exhibit negative effects, contributing to population
rejuvenation. As expected, cities with a higher rate of agricultural workers exhibit a significant aging
3Landprice data: https://www.lic.or.jp/landinfo/research.html; all others: https://www.e-stat.go.jp/en.
4They correspond to each of the following: Hokkaido-Tohoku, Chubu, Kinki, Chugoku-Shikoku, and Kyushu-Okinawa
regions.
17

Figure 1: Age quantile functions of randomly selected cities
trend.
Both the number of hospital beds and childcare facilities positively affect age distribution,
although the underlying mechanisms are unclear.
It is important to recall that in this study, the
covariates are treated as fixed, and their potential endogeneity is ignored. To interpret the obtained
results as a causal relationship, addressing the endogeneity issue more carefully would be necessary.
The estimated spatial effect function is reported in Figure 2.
The figure includes nine panels,
each corresponding to different s-values.
In the figure, we also report the computed test statistic
pTn ´ µnq{?vn for I “ r0, 1s. From these results, we can observe the following: First, the values of
the test statistic suggest that the spatial effects exist significantly at all nine quantiles. However, when
quantile t of the neighbor is close to either of the boundary points 0 or 1, almost no or weak spatial
effects are present. This seems reasonable considering Figure 1; only a little regional heterogeneity in age
distribution is present at these extreme quantiles. The spatial interaction effects become particularly
strong when both t and s are approximately 0.2 – 0.5, which roughly correspond to the ages of the
younger working population. This result might suggest that the growth of economic activities and their
spillovers play main roles in forming the spatial trend of age distribution. Notably, the impacts from
these lower-to-middle quantile values somewhat persist even for higher quantile ages. This could be
reflecting the indirect effects from positive interactions among younger age groups, rather than a direct
causal relationship across different quantiles.
6
Conclusion
In this study, we developed a new SAR model for analyzing spatial interactions among functional
outcomes. For estimation, we developed a penalized 2SLS estimator and established its asymptotic
properties under certain regularity conditions. Additionally, we developed a method for statistically
18

testing the presence of spatial interactions. To illustrate the effectiveness of our proposed method, we
performed an empirical analysis focusing on the age distribution in Japanese cities.
An important potential limitation of our study is that, while we have treated the covariates as fixed
variables to simplify the theoretical exposition, this approach essentially obscures the endogeneity issue
underlying the covariates. For instance, in our empirical analysis, it might be reasonable to consider
the unemployment rate as an endogenous variable correlated with unobserved regional factors affecting
the age distribution as well. One way to mitigate the endogeneity issue would be to extend the current
model to a panel data model with functional fixed effects, which should be a promising topic for future
studies. Another important future work is how to perform the estimation and inference when α0 and
β0 are not significant, leading to a weak IV problem. We conjecture that the inclusion of additional
moment conditions based on the distribution of the error term might be effective in addressing this
issue, as in Lee (2007). Several other issues that need future investigation include: data-driven selection
of tuning parameters and developing methods for uniform inference on the functional parameters.
19

Figure 2: Estimated spatial effect function
In each panel, the solid line indicates the estimated α0p¨, sq, and the dotted lines indicate the 95% confidence interval.
20

References
Ando, T., Li, K., and Lu, L., 2023. A spatial panel quantile model with unobserved heterogeneity,
Journal of Econometrics, 232 (1), 191–213.
Belloni, A., Chernozhukov, V., Chetverikov, D., and Kato, K., 2015. Some new asymptotic theory for
least squares series: Pointwise and uniform results, Journal of Econometrics, 186 (2), 345–366.
Bigot, J., Gouet, R., Klein, T., and L´opez, A., 2017. Geodesic pca in the wasserstein space by convex
pca, Annales de l’Institut Henri Poincar´e, Probabilit´es et Statistiques, 53 (1), 1 – 26.
Blundell, R., Chen, X., and Kristensen, D., 2007. Semi-nonparametric iv estimation of shape-invariant
Engel curves, Econometrica, 75 (6), 1613–1669.
Breunig, C., Mammen, E., and Simoni, A., 2020. Ill-posed estimation in high-dimensional models with
instrumental variables, Journal of Econometrics, 219 (1), 171–200.
Chen, Y., Lin, Z., and M¨uller, H.G., 2023. Wasserstein regression, Journal of the American Statistical
Association, 118 (542), 869–882.
Crambes, C., Kneip, A., and Sarda, P., 2009. Smoothing splines estimators for functional linear regres-
sion, The Annals of Statistics, 37 (1), 35–72.
de Jong, P., 1987. A central limit theorem for generalized quadratic forms, Probability Theory and
Related Fields, 75 (2), 261–277.
Delicado, P., 2011. Dimensionality reduction when data are density functions, Computational Statistics
& Data Analysis, 55 (1), 401–420.
Dong, C., Chen, R., Xiao, Z., and Liu, W., 2024. Functional quantile autoregression, Journal of Econo-
metrics, forthcoming.
Dudley, R.M. and Philipp, W., 1983. Invariance principles for sums of Banach space valued random
elements and empirical processes, Zeitschrift f¨ur Wahrscheinlichkeitstheorie und verwandte Gebiete,
62 (4), 509–552.
Ghodrati, L. and Panaretos, V.M., 2022. Distribution-on-distribution regression via optimal transport
maps, Biometrika, 109 (4), 957–974.
Gunsilius, F.F., 2023. Distributional synthetic controls, Econometrica, 91 (3), 1105–1117.
Han, K., M¨uller, H.G., and Park, B.U., 2020. Additive functional regression for densities as responses,
Journal of the American Statistical Association, 115 (530), 997–1010.
Hausman, J.A., Newey, W.K., Woutersen, T., Chao, J.C., and Swanson, N.R., 2012. Instrumental
variable estimation with heteroskedasticity and many instruments, Quantitative Economics, 3 (2),
211–255.
21

Hoshino, T., 2022. Sieve IV estimation of cross-sectional interaction models with nonparametric en-
dogenous effect, Journal of Econometrics, 229 (2), 263–275.
Hron, K., Menafoglio, A., Templ, M., Hruzov´a, K., and Filzmoser, P., 2016. Simplicial principal com-
ponent analysis for density functions in bayes spaces, Computational Statistics & Data Analysis, 94,
330–350.
Imaizumi, M. and Kato, K., 2018. PCA-based estimation for functional linear regression with functional
responses, Journal of Multivariate Analysis, 163, 15–36.
Jenish, N., 2012. Nonparametric spatial regression under near-epoch dependence, Journal of Econo-
metrics, 167 (1), 224–239.
Jenish, N. and Prucha, I.R., 2009. Central limit theorems and uniform laws of large numbers for arrays
of random fields, Journal of Econometrics, 150 (1), 86–98.
Jenish, N. and Prucha, I.R., 2012. On spatial processes and asymptotic inference under near-epoch
dependence, Journal of Econometrics, 170 (1), 178–190.
Kelejian, H.H. and Prucha, I.R., 2010. Specification and estimation of spatial autoregressive models
with autoregressive and heteroskedastic disturbances, Journal of Econometrics, 157 (1), 53–67.
Kress, R., 2014. Linear Integral Equations, Third Edition, Springer.
Lee, L.f., 2007. GMM and 2SLS estimation of mixed regressive, spatial autoregressive models, Journal
of Econometrics, 137 (2), 489–514.
Malikov, E., Sun, Y., and Hite, D., 2019. (Under) mining local residential property values: A semi-
parametric spatial quantile autoregression, Journal of Applied Econometrics, 34 (1), 82–109.
Panaretos, V.M. and Zemel, Y., 2020. An Invitation to Statistics in Wasserstein Space, Springer.
Petersen, A., Liu, X., and Divani, A.A., 2021. Wasserstein F-tests and confidence bands for the Fr´echet
regression of density response curves, The Annals of Statistics, 49 (1), 590–611.
Petersen, A. and M¨uller, H.G., 2016. Functional data analysis for density functions by transformation
to a Hilbert space, The Annals of Statistics, 183–218.
Petersen, A., Zhang, C., and Kokoszka, P., 2022. Modeling probability density functions as data objects,
Econometrics and Statistics, 21, 159–178.
Ramsay, J. and Silverman, B., 2005. Functional Data Analysis, Springer.
Su, L. and Yang, Z., 2011. Instrumental variable quantile estimation of spatial autoregressive models,
Working Paper.
Tchuente, G., 2019. Weak identification and estimation of social interaction models, arXiv preprint
arXiv:1902.06143.
22

Yang, H., 2020. Random distributional response model based on spline method, Journal of Statistical
Planning and Inference, 207, 27–44.
Yang, H., Baladandayuthapani, V., Rao, A.U., and Morris, J.S., 2020. Quantile function on scalar
regression analysis for distributional data, Journal of the American Statistical Association, 115 (529),
90–106.
Zhu, X., Cai, Z., and Ma, Y., 2022. Network functional varying coefficient model, Journal of the
American Statistical Association, 117 (540), 2074–2085.
23

Appendix
A
Proofs
Definition A.1. Let x “ txn,i : i P Dn; n ě 1u and e “ ten,i : i P Dn; n ě 1u be triangular arrays
of random fields, where x and e are real-valued and general (possibly infinite-dimensional) random
variables, respectively. Then, the random field x is said to be Lp-near-epoch dependent (NED) on e if
}xn,i ´ E rxn,i | Fn,ipδqs}p ď cn,iφpδq
for an array of finite positive constants tcn,i : i P Dn; n ě 1u and some function φpδq ě 0 with φpδq Ñ 0
as δ Ñ 8, where Fn,ipδq is the σ-field generated by ten,j : ∆pi, jq ď δu. The cn,i’s and φpδq are called
the NED scaling factors and NED coefficient, respectively. The x is said to be uniformly Lp-NED on e
if cn,i is uniformly bounded. If φpδq À ϱδ for some 0 ă ϱ ă 1, then it is called geometrically Lp-NED.
Lemma A.1. Suppose that Assumptions 2.1, 3.1, 3.2(i), and 3.3(i) hold. Then, for a given s P p0, 1q,
tqipsq : i P Dn; n ě 1u is uniformly and geometrically Lp-NED on tεi : i P Dn; n ě 1u.
Proof. We prove the lemma in a similar manner to Jenish (2012) and Hoshino (2022). First, note that
Q is uniquely determined in Hn,8 as Q “ pId ´ T q´1Xβ0 ` pId ´ T q´1E under Assumption 2.1. We
denote the i-th element of pId ´ T q´1Xβ0 ` pId ´ T q´1r¨s as fir¨s, such that qi “ firEs holds for each
i “ 1, . . . , n.
Define
Epδq
1,i :“ tεjuj:∆pi,jqďδ,
Epδq
2,i :“ tεjuj:∆pi,jqąδ
for some δ ą 0. Since Lpp0, 1q is separable for 1 ď p ă 8, under Assumption 3.3(i), both Epδq
1,i and
Epδq
2,i are Polish space-valued random elements in pH|tj:∆pi,jqďδu|,p, || ¨ ||8,pq and pH|tj:∆pi,jqąδu|,p, || ¨ ||8,pq,
respectively (recall: Hn,p :“ tH “ ph1, . . . , hnq : hi P Lpp0, 1q for all iu).
Then, by Lemma 2.11
of Dudley and Philipp (1983) (see also Lemma A.1 of Jenish (2012)), a function χ exists such that
pEpδq
1,i , χpU, Epδq
1,i qq has the same law as that of pEpδq
1,i , Epδq
2,i q, which is an appropriate rearrangement of E,
where U is a random variable uniformly distributed on r0, 1s and independent of Epδq
1,i .
Now, write firEpδq
1,i , Epδq
2,i s ” firEs, and define qpδq
i
:“ firEpδq
1,i , χpU, Epδq
1,i qs ” firEpδqs with Epδq “
pεpδq
1 , . . . , εpδq
n qJ; specifically,
qpδq
i psq “
!
pId ´ T q´1rXβ0 ` Epδqspsq
)
i
“
nÿ
j“1
wi,j
ż 1
0
qpδq
j ptqα0pt, sqdt ` xJ
i β0psq ` εpδq
i psq.
By construction, we have
Erqipsq | Fn,ipδqs “ E
”
firEpδq
1,i , Epδq
2,i spsq | Epδq
1,i
ı
“ E
”
firEpδq
1,i , χpU, Epδq
1,i qspsq | Epδq
1,i
ı
“ Erqpδq
i psq | Fn,ipδqs,
24

where Fn,ipδq is the σ-field generated by tεj : ∆pi, jq ď δu.
Here, suppose that 0 ă δ ă ∆, where ∆is as provided in Assumption 3.1(ii). Then, because at
least i’s own εi is included in Epδq
1,i , we have εi ” εpδq
i , and hence
qipsq ´ qpδq
i psq “
nÿ
j“1
wi,j
ż 1
0
rqjptq ´ qpδq
j ptqsα0pt, sqdt
holds. By Minkowski’s inequality,
›››qipsq ´ qpδq
i psq
›››
p “
›››››
nÿ
j“1
wi,j
ż 1
0
rqjptq ´ qpδq
j ptqsα0pt, sqdt
›››››
p
ď
nÿ
j“1
|wi,j| ¨
ż 1
0
›››qjptq ´ qpδq
j ptq
›››
p |α0pt, sq|dt
ď 2α0||Wn||8 max
1ďjďn
ż 1
0
}qjptq}p dt ď Cp ¨ ϱ,
where Cp :“ 2 sup1ďiďn; ně1
ş1
0 ||qiptq||pdt, and ϱ :“ α0||Wn||8. Similarly, when ∆ď δ ă 2∆holds,
noting now that under Assumption 3.1(ii) we have εj ” εpδq
j
for all j’s who are direct neighbors of i,
›››qipsq ´ qpδq
i psq
›››
p ď α0
nÿ
j“1
|wi,j| ¨
ż 1
0
›››qjpt1q ´ qpδq
j pt1q
›››
p dt1
“ α0
nÿ
j“1
|wi,j| ¨
ż 1
0
›››››
nÿ
k“1
wj,k
ż 1
0
rqkpt2q ´ qpδq
k pt2qsα0pt2, t1qdt2
›››››
p
dt1
“ α2
0
nÿ
j“1
|wi,j|
nÿ
k“1
|wj,k| ¨
ż 1
0
ż 1
0
›››qkpt2q ´ qpδq
k pt2q
›››
p dt2dt1 ď Cp ¨ ϱ2.
Applying the same argument recursively, for m∆ď δ ă pm ` 1q∆such that εj ” εpδq
j
for all j’s in the
m-th order neighborhood of i, we obtain
›››qipsq ´ qpδq
i psq
›››
p ď Cp ¨ ϱtδ{∆u`1.
(A.1)
Finally, by Jensen’s inequality and (A.1),
}qipsq ´ Erqipsq | Fn,ipδqs}p “
››››
ż 1
0
”
firEpδq
1,i , Epδq
2,i spsq ´ firEpδq
1,i , χpu, Epδq
1,i qspsq
ı
du
››››
p
ď
"
E
ż 1
0
ˇˇˇfirEpδq
1,i , Epδq
2,i spsq ´ firEpδq
1,i , χpu, Epδq
1,i qspsq
ˇˇˇ
p
du
*1{p
“
!
E
ˇˇˇfirEpδq
1,i , Epδq
2,i spsq ´ firEpδq
1,i , χpU, Epδq
1,i qspsq
ˇˇˇ
p)1{p
“
›››firEpδq
1,i , Epδq
2,i spsq ´ firEpδq
1,i , χpU, Epδq
1,i qspsq
›››
p
“
›››qipsq ´ qpδq
i psq
›››
p ď Cp ¨ ϱtδ{∆u`1 Ñ 0
25

as δ Ñ 8 by Assumption 2.1. This completes the proof.
Lemma A.2. Suppose that txn,i : i P Dn; n ě 1u is geometrically L2-NED on tεi : i P Dn; n ě
1u. Then, under Assumption 3.3(ii), |Cov pxn,i, xn,jq| ď 32pmax1ďiďn ||xn,i||2q2φp∆pi, jq{3q for some
geometric NED coefficient φ.
Proof. Decompose xn,i “ xpδq
n,1,i ` xpδq
n,2,i, where
xpδq
n,1,i :“ E rxn,i | Fn,ipδqs , and xpδq
n,2,i :“ xn,i ´ E rxn,i | Fn,ipδqs .
Then, for each pair xn,i and xn,j, setting: δ “ ∆pi, jq{3,
|Cov pxn,i, xn,jq| “
ˇˇˇCov
´
xp∆pi,jq{3q
n,1,i
` xp∆pi,jq{3q
n,2,i
, xp∆pi,jq{3q
n,1,j
` xp∆pi,jq{3q
n,2,j
¯ˇˇˇ
ď
ˇˇˇCov
´
xp∆pi,jq{3q
n,1,i
, xp∆pi,jq{3q
n,1,j
¯ˇˇˇ `
ˇˇˇCov
´
xp∆pi,jq{3q
n,1,i
, xp∆pi,jq{3q
n,2,j
¯ˇˇˇ
`
ˇˇˇCov
´
xp∆pi,jq{3q
n,2,i
, xp∆pi,jq{3q
n,1,j
¯ˇˇˇ `
ˇˇˇCov
´
xp∆pi,jq{3q
n,2,i
, xp∆pi,jq{3q
n,2,j
¯ˇˇˇ .
Since tεk : ∆pi, kq ď ∆pi, jq{3u and tεk : ∆pj, kq ď ∆pi, jq{3u do not overlap, the first term on the
right-hand side is zero by Assumption 3.3(ii). Note that, by Jensen’s inequality, ||xp∆pi,jq{3q
n,1,i
||2 ď ||xn,i||2.
In addition, ||xp∆pi,jq{3q
n,2,i
||2 ď 2||xn,i||2. Then, since txn,iu is assumed to be L2-NED, it holds that
›››xp∆pi,jq{3q
n,2,i
›››
2 “ }xn,i ´ E rxn,i | Fn,ip∆pi, jq{3qs}2 ď 2 max
1ďiďn ||xn,i||2φp∆pi, jq{3q.
Hence, Cauchy–Schwarz inequality gives
ˇˇˇCov
´
xp∆pi,jq{3q
n,1,i
, xp∆pi,jq{3q
n,2,j
¯ˇˇˇ ď 4
›››xp∆pi,jq{3q
n,1,i
›››
2
›››xp∆pi,jq{3q
n,2,j
›››
2 ď 8
ˆ
max
1ďiďn ||xn,i||2
˙2
φp∆pi, jq{3q.
Similarly,
ˇˇˇCov
´
xp∆pi,jq{3q
n,2,i
, xp∆pi,jq{3q
n,2,j
¯ˇˇˇ ď 4
›››xp∆pi,jq{3q
n,2,i
›››
2
›››xp∆pi,jq{3q
n,2,j
›››
2 ď 16
ˆ
max
1ďiďn ||xn,i||2
˙2
φp∆pi, jq{3q.
This completes the proof.
Lemma A.3. Suppose that Assumptions 2.1, 3.1, 3.2(i), 3.3(i)–(ii), and 3.4(i) hold. Then,
(i)
››ZJR{n ´ EpZJR{nq
›› Àp
?
KL
?n .
If Assumption 3.8 additionally holds, we have
(ii)
›››ZJ pR{n ´ EpZJR{nq
››› Àp
?
KL
?n
` κξ?
KL.
26

Proof. (i) Write zi “ pzJ
1,i, xJ
i qJ “ pzi,1, . . . , zi,L`dxqJ. By Assumption 3.2(i),
E
›››››
1
n
nÿ
i“1
`
zirJ
i ´ EpzirJ
i q
˘
›››››
2
“ 1
n2
K
ÿ
k“1
L`dx
ÿ
l“1
E
# nÿ
i“1
pzi,lri,k ´ Epzi,lri,kqq
+2
“ 1
n2
K
ÿ
k“1
L`dx
ÿ
l“1
nÿ
i“1
Var pzi,lri,kq
` 1
n2
K
ÿ
k“1
L`dx
ÿ
l“1
nÿ
i“1
nÿ
j‰i
Cov pzi,lri,k, zj,lrj,kq
À L
n2
K
ÿ
k“1
nÿ
i“1
E
“
r2
i,k
‰
` L
n2
K
ÿ
k“1
nÿ
i“1
nÿ
j‰i
|Cov pri,k, rj,kq| .
(A.2)
By Cauchy–Schwarz inequality, |
ş1
0 qipsqϕkpsqds| ď ||qi||L2||ϕk||L2 ă 8 by Assumption 3.4(i). Thus,
E
“
r2
i,k
‰
“
nÿ
l“1
nÿ
j“1
wi,jwi,lE
„ż 1
0
qjps1qϕkps1qds1
ż 1
0
qlps2qϕkps2qds2
ȷ
ă 8
(A.3)
uniformly in i, implying that the first term on the right-hand side of (A.2) is of order KL{n.
Next, by Lemma A.1 as δ Ñ 8,
}ri,k ´ Erri,k | Fn,ipδqs}2 “
›››››
nÿ
j“1
wi,j
ż 1
0
qjptqϕkptqdt ´
nÿ
j“1
wi,j
ż 1
0
E rqjptq | Fn,ipδqs ϕkptqdt
›››››
2
ď
nÿ
j“1
|wi,j|
››››
ż 1
0
pqjptq ´ Erqjptq | Fn,ipδqsqϕkptqdt
››››
2
ď
nÿ
j“1
|wi,j|
ż 1
0
}qjptq ´ Erqjptq | Fn,ipδqs}2 |ϕkptq|dt
ď ||Wn||8C2 ¨
ż 1
0
|ϕkptq|dt ¨ ϱtδ{∆u`1 Ñ 0.
Thus, tri,ku is uniformly and geometrically L2-NED on tεiu, and from Lemma A.2 and (A.3), |Cov pri,k, rj,kq| À
φp∆pi, jq{3q, where φpδq is some geometric NED coefficient. Then, Lemma A.1(iii) of Jenish and Prucha
(2009) gives
1
n
K
ÿ
k“1
nÿ
i“1
nÿ
j‰i
|Cov pri,k, rj,kq| À 1
n
K
ÿ
k“1
nÿ
i“1
nÿ
j‰i
φp∆pi, jq{3q
“ 1
n
K
ÿ
k“1
nÿ
i“1
8
ÿ
m“1
ÿ
j: ∆pi,jqPrm,m`1q
φp∆pi, jq{3q
À 1
n
K
ÿ
k“1
nÿ
i“1
8
ÿ
m“1
md´1φpmq À K,
where the last equality holds by the geometric NED property.
27

Combining these results, by Markov’s inequality, we have the desired result.
(ii) By the triangle inequality,
›››ZJ pR{n ´ EpZJR{nq
››› ď
›››ZJp pR ´ Rq{n
››› `
››ZJR{n ´ EpZJR{nq
››
Àp
?
KL{?n by (i)
.
For the first term on the right-hand side, observe that
ZJp pR ´ Rq{n “ 1
n
nÿ
i“1
zippri,k ´ ri,kq
“ 1
n
nÿ
i“1
nÿ
j“1
ziwi,j
ż 1
0
rpqjptq ´ qjptqs ϕKptqJdt,
and hence
›››ZJp pR ´ Rq{n
››› ď 1
n
řn
i“1
řn
j“1 ||zi|| ¨ |wi,j| ¨
›››
ş1
0rpqjptq ´ qjptqsϕKptqdt
›››. Here, define
ω˚
i ptq “
$
’
’
’
&
’
’
’
%
0
if t ă si,1
si,lptq`1´t
si,lptq`1´si,lptq
if si,1 ď t ď si,mi with t P rsi,lptq, si,lptq`1s
1
if t ą si,mi
such that we can write for all t P r0, 1s
pqiptq “ ω˚
i ptqrqipsi,lptqq ´ qipsi,lptq`1qs ` qipsi,lptq`1q
(recall: si,0 “ 0 and si,mi`1 “ 1). Thus, under Assumption 3.8,
|pqiptq ´ qiptq| ď
ˇˇω˚
i ptqrqipsi,lptqq ´ qipsi,lptq`1qs
ˇˇ `
ˇˇqipsi,lptq`1q ´ qiptq
ˇˇ
(A.4)
ď
ˇˇsi,lptq ´ si,lptq`1
ˇˇξ `
ˇˇsi,lptq`1 ´ t
ˇˇξ À κξ,
uniformly in t, leading to
››››
ż 1
0
rpqiptq ´ qiptqsϕKptqdt
›››› ď
ż 1
0
}rpqiptq ´ qiptqsϕKptq} dt
À κξ
ż 1
0
}ϕKptq} dt
À
?
Kκξ
for all i, where the last line follows from
ş1
0 }ϕKptq} dt “
ş1
0
´řK
k“1 ϕ2
kptq
¯1{2
dt ď
´řK
k“1 ||ϕk||2
L2
¯1{2
.
This completes the proof.
Lemma A.4. Suppose that Assumptions 2.1, 3.1, 3.2, 3.3(i)–(ii), 3.4(i), and 3.5 hold. Then,
(i)
›››R
JMzR{n ´ ER
JMzER{n
››› Àp
?
KL
?n
28

(ii)
››››
”
R
JMzR{n
ı´1
´
”
ER
JMzER{n
ı´1›››› Àp
?
KL
ν2
KL
?n
If Assumption 3.8 additionally holds, we have
(iii)
››› pRJMz pR{n ´ ER
JMzER{n
››› Àp
?
KL
?n
` κξ?
KL
(iv)
››››
”
pRJMz pR{n
ı´1
´
”
ER
JMzER{n
ı´1›››› Àp
?
KL
ν2
KL
?n ` κξ?
KL
ν2
KL
.
Proof. The proofs are analogous to the proof of Lemma A.7 in Hoshino (2022), and thus are omitted.
Proof of Theorem 3.1
(i) Letting Upsq “ pu1psq, . . . , unpsqqJ, we write
Qpsq “ Rθ0psq ` Xβ0psq ` Epsq ` Upsq.
Observe that
uipsq “
nÿ
j“1
wi,j
«ż 1
0
qjptqα0pt, sqdt ´
K
ÿ
k“1
rj,kθ0kpsq
ff
“
nÿ
j“1
wi,j
«ż 1
0
qjptqα0pt, sqdt ´
K
ÿ
k“1
ż 1
0
qjptqϕkptqdt ¨ θ0kpsq
ff
“
nÿ
j“1
wi,j
ż 1
0
qjptq
#
α0pt, sq ´
K
ÿ
k“1
ϕkptqθ0kpsq
+
dt.
Hence, by Cauchy–Schwarz inequality,
|uipsq| ď }Wn}8 ¨ max
1ďiďn ||qi||L2 ¨
›››››α0p¨, sq ´
K
ÿ
k“1
ϕkp¨qθ0kpsq
›››››
L2
À ℓKpsq
(A.5)
uniformly in i.
Noting that XJSR “ XJR, we decompose
pβnpsq ´ β0psq “
“
XJpIn ´ SqX
‰´1 XJpIn ´ SqQpsq ´ β0psq
“
“
XJpIn ´ SqX
‰´1 XJpIn ´ SqrRθ0psq ` Epsq ` Upsqs
“
“
XJpIn ´ SqX
‰´1 XJpIn ´ SqEpsq `
“
XJpIn ´ SqX
‰´1 XJpIn ´ SqUpsq
“: A1 ` A2, say.
By a straightforward matrix-norm calculation (see, e.g., Fact A.2 in Hoshino (2022)) and Lemmas
29

A.3(i) and A.4(ii)
››››XJSX{n ´ EpXJR{nq
”
ER
JMzER{n
ı´1
EpR
JX{nq
››››
À ρmax
´
pR
JX{nqpXJR{nq
¯ ››››
”
R
JMzR{n
ı´1
´
”
ER
JMzER{n
ı´1››››
` ρmax
ˆ”
ER
JMzER{n
ı´1˙ ›››R
JX{n ´ EpR
JX{nq
››› Àp
?
KL
ν2
KL
?n `
?
K
νKL
?n.
This implies that
››XJpIn ´ SqX{n ´ Σn,x
›› Àp
?
KL
ν2
KL
?n Ñ 0,
(A.6)
where recall that
Σn,x “ XJX{n ´ EpXJR{nq
”
ER
JMzER{n
ı´1
EpR
JX{nq.
Under Assumption 3.6, (A.6) ensures that ρminpXJpIn ´SqX{nq ą 0 with probability approaching one.
Because the eigenvalue of an idempotent matrix is at most one, we obtain by (A.5) that
||A2||2 “ UpsqJpIn ´ SqX
“
XJpIn ´ SqX
‰´2 XJpIn ´ SqUpsq
Àp UpsqJpIn ´ SqX
“
XJpIn ´ SqX
‰´1 XJpIn ´ SqUpsq{n
Àp
1
n
nÿ
i“1
|uipsq|2 À ℓ2
Kpsq.
Hence, ||A2|| Àp ℓKpsq.
Next, for A1, decompose A1 :“ A11 ` A12, where
A11 :“ Σ´1
n,xΨJ
n,xEpsq{n
A12 :“
“
XJpIn ´ SqX{n
‰´1 XJpIn ´ SqEpsq{n ´ Σ´1
n,xΨJ
n,xEpsq{n
and recall that
Ψn,x “ X ´ EpMzR{nq
”
ER
JMzER{n
ı´1
EpR
JX{nq.
By Assumptions 3.3(ii) and 3.6, E||A11||2 “ tr
␣
Σ´1
n,xΩn,xΣ´1
n,x
(
{n À 1{n.
Hence, it follows from
Markov’s inequality that ||A11|| Àp n´1{2.
Next, let
P :“ pZJZ{nq´EpZJR{nq
”
ER
JMzER{n
ı´1
EpR
JZ{nqpZJZ{nq´
pP :“ pZJZ{nq´pZJR{nq
”
R
JMzR{n
ı´1
pR
JZ{nqpZJZ{nq´
30

such that we can write XJpIn ´ SqEpsq{n “ XJEpsq{n ´ pXJZ{nq pPpZJEpsq{nq and ΨJ
n,xEpsq{n “
XJEpsq{n ´ pXJZ{nqPpZJEpsq{nq. A12 can be decomposed further into three terms in the following
manner:
A12 “
!“
XJpIn ´ SqX{n
‰´1 ´ Σ´1
n,x
)
XJEpsq{n
´
!“
XJpIn ´ SqX{n
‰´1 ´ Σ´1
n,x
)
pXJZ{nq pPpZJEpsq{nq ´ Σ´1
n,xpXJZ{nqt pP ´ PupZJEpsq{nq
“: A12a ´ A12b ´ A12c, say.
By Markov’s inequality, it is easy to observe ||XJEpsq{n|| Àp n´1{2. Then, by (A.6) and Assumption
3.6, we have ||A12a|| Àp
?
KL{pν2
KLnq. Similarly, for A12b, noting that ρmaxp pPq ď ρmaxppZJZ{nq´q À 1,
||A12b||2 Àp
KL
ν4
KLn ¨ tr
!
pXJZ{nq pPpZJEpsq{nqpEpsqJZ{nq pPpZJX{nq
)
Àp
KL
ν4
KLn ¨
››pXJZ{nqpZJEpsq{nq
››2 .
Hence, it also holds that ||A12b|| Àp
?
KL{pν2
KLnq by Assumptions 3.3(ii) and (iii).
Here, note that
››› pP ´ P
››› Àp
?
KL
ν2
KL
?n
by Lemmas A.3(i) and A.4(ii). Thus, for A12c, we have
||A12c||2 À
›››Σ´1
n,xpXJZ{nqt pP ´ Pu
›››
2
¨
››ZJEpsq{n
››2
implying that ||A12c|| Àp L
?
K{pν2
KLnq. Combining these results, we obtain the desired result under
L
?
K{pν2
KL
?nq À 1.
(ii) Note that R
J
x MzRx{n “ R
JpMz ´ MxqR{n. Then,
›››R
J
x MzRx{n ´ ER
J
x MzERx
››› ď
›››R
JMzR{n ´ ER
JMzER{n
›››
`
›››R
JMxR{n ´ ER
JMxER{n
››› “ oP p1q
by Lemma A.4(i). With this, Assumption 3.5(ii) implies that ρminpR
J
x MzRx{nq ě cνKL with probability
approaching one for some 1 ą c ą 0. Thus, by Weyl’s inequality,
ρmin
´
R
J
x MzRx{n ` λD
¯
ě crνKL ` λρDs
with probability approaching one. Now, decompose
pθnpsq ´ θ0psq “
”
R
J
x MzRx ` λDn
ı´1
R
J
x MzQpsq ´ θ0psq
31

“
”
R
J
x MzRx ` λDn
ı´1
R
J
x MzRθ0psq ´ θ0psq
`
”
R
J
x MzRx ` λDn
ı´1
R
J
x MzEpsq `
”
R
J
x MzRx ` λDn
ı´1
R
J
x MzUpsq
“: B1 ` B2 ` B3, say.
Noting that RJ
x MzRx “ RJ
x MzR and
B1 “
”
R
J
x MzRx ` λDn
ı´1 ”
R
J
x MzR ` λDn ´ λDn
ı
θ0psq ´ θ0psq
“ ´λpΣ´1
n,r,λDθ0psq,
where pΣn,r,λ :“ R
J
x MzRx{n ` λD,
||B1||2 “ λ2 ¨ θ0psqJDpΣ´2
n,r,λDθ0psq Àp
λ2 }θ0psq}2
D
pνKL ` λρDq2 .
Here, for any matrices A and B such that AJA is nonsingular and B is symmetric and positive
semidefinite, it holds thati
ArAJAs´1AJ ´ ArAJA ` Bs´1AJ “ A
`
rAJAs´1 ´ rAJA ` Bs´1˘
AJ
“ A
`
rAJA ` Bs´1 “
pAJA ` Bq ´ pAJAq
‰
rAJAs´1˘
AJ
“ A
`
rAJA ` Bs´1BrAJAs´1˘
AJ
symmetric positive semidefinite
,
which implies that
ρmaxpArAJA ` Bs´1AJq ď 1.
(A.7)
From this with A “ MzRx{?n and B “ λD, we can easily observe that ||B3|| Àp ℓKpsq{?νKL ` λρD.
For B2, decompose B2 :“ B21 ` B22 ` B23, where
B21 :“ Σ´1
n,r,λER
J
x MzEpsq{n
B22 :“
!
pΣ´1
n,r,λ ´ Σ´1
n,r,λ
)
ER
J
x MzEpsq{n
B23 :“ pΣ´1
n,r,λ
␣
Rx ´ ERx
(J MzEpsq{n.
Note that (A.7) implies the following:
ρmax
`
rAJA ` Bs´1AJArAJA ` Bs´1˘
“ ρmax
`
rAJA ` Bs´1AJApAJAq´1AJArAJA ` Bs´1˘
“ ρmax
´
pAJAq´1{2AJArAJA ` Bs´2AJApAJAq´1{2¯
ď ρmax
`
rAJA ` Bs´1˘
.
(A.8)
Using this, we obtain ||B21|| Àp
?
K{
a
npνKL ` λρDq. Next, by the same argument as in Lemma
iThe symmetricity can be confirmed by A
`
rAJA ` Bs´1BrAJAs´1˘
AJ “ A
`
rAJAs´1BrAJA ` Bs´1˘
AJ.
32

A.4(ii), we have
›››pΣ´1
n,r,λ ´ Σ´1
n,r,λ
››› Àp
?
KL
?npνKL ` λρDq2 Ñ 0.
Thus, by Markov’s inequality, ||B22|| “ oP p
?
K{?nq. Finally, it is straightforward to observe that
||B23|| ď
›››pΣ´1
n,r,λ
!
R
J
x Z{n ´ ER
J
x Z{n
)››› ¨
››pZJZ{nq´1ZJEpsq{n
››
Àp
˜
?
KL
?npνKL ` λρDq
¸
¨
˜?
L
?n
¸
“ o
˜
?
K
a
npνKL ` λρDq
¸
by Lemma A.3(i), where the last equality is due to L{
a
npνKL ` λρDq “ op1q. Combining these results,
we obtain
›››pθnpsq ´ θ0psq
››› Àp
?
K{?n ` ℓKpsq
?νKL ` λρD
` λ||θ0psq||D
νKL ` λρD
.
(A.9)
By the triangle inequality,
}pαnp¨, sq ´ α0p¨, sq}L2 ď
››ϕKp¨qJθ0psq ´ α0p¨, sq
››
L2 `
›››ϕKp¨qJppθnpsq ´ θ0psqq
›››
L2
À ℓKpsq `
›››ϕKp¨qJppθnpsq ´ θ0psqq
›››
L2 .
Under Assumption 3.4(iii), we have
›››ϕKp¨qJppθnpsq ´ θ0psqq
›››
2
L2 “ ppθnpsq ´ θ0psqqJ
"ż 1
0
ϕKptqϕKptqJdt
*
ppθnpsq ´ θ0psqq
À
›››pθnpsq ´ θ0psq
›››
2
.
Then, the proof is completed in view of (A.9).
Proof of Theorem 3.2
(i) Using the notation in the proof of Theorem 3.1(i), we have ?nppβnpsq ´ β0psqq “ ?nA1 ` ?nA2.
Recalling that ||A2|| Àp ℓKpsq, ?nA2 “ oP p1q holds by assumption. Further, as shown in the proof of
Theorem 3.1(i), A1 “ A11 ` A12a ´ A12b ´ A12c with
||A12a|| Àp
?
KL
ν2
KLn,
||A12b|| Àp
?
KL
ν2
KLn,
||A12c|| Àp
L
?
K
ν2
KLn.
Hence, ?nA1 “ ?nA11 ` oP p1q under the assumptions made here. Here, let c be an arbitrary dx ˆ 1
vector such that ||c|| “ 1, and let
an :“ cJΛ´1{2
n,x Σ´1
n,xΨJ
n,xEnpsq{?n
?nA11
,
33

where Λn,x :“ Σ´1
n,xΩn,xΣ´1
n,x. Below, we show that
an
dÑ Np0, 1q,
which implies the desired result. Define
ran,i :“ n´1{2cJΛ´1{2
n,x Σ´1
n,xpxi ´ rziqεipsq,
where rzi :“ pXJZ{nqPzi. Then, an “ řn
i“1 ran,i holds with Eran,i “ 0 and řn
i“1 Epra2
n,iq “ 1. Letting
ran,1,i :“ n´1{2cJΛ´1{2
n,x Σ´1
n,xxiεipsq and ran,2,i :“ ´n´1{2cJΛ´1{2
n,x Σ´1
n,xrziεipsq so that ran,i “ ran,1,i ` ran,2,i,
by the cr-inequality, Epra4
n,iq ď 8Epra4
n,1,iq ` 8Epra4
n,2,iq holds, where
Epra4
n,1,iq À n´2cJΛ´1{2
n,x Σ´1
n,xxixJ
i Σ´1
n,xΛ´1{2
n,x ccJΛ´1{2
n,x Σ´1
n,xxixJ
i Σ´1
n,xΛ´1{2
n,x c À n´2,
and
Epra4
n,2,iq À n´2cJΛ´1{2
n,x Σ´1
n,xrzirzJ
i Σ´1
n,xΛ´1{2
n,x ccJΛ´1{2
n,x Σ´1
n,xrzirzJ
i Σ´1
n,xΛ´1{2
n,x c À n´2||rzi||4 À L2{n2
under Assumption 3.3(iii). Hence, řn
i“1 Epra4
n,iq À L2{n Ñ 0. Then, applying Lyapunov’s central limit
theorem completes the proof.
(ii) First, by Weyl’s inequality,
ρminprAJA ` Bs´1AJArAJA ` Bs´1q “ ρminprAJA ` Bs´1 ´ rAJA ` Bs´1BrAJA ` Bs´1q
(A.10)
ě ρminprAJA ` Bs´1q ´ ρmaxprAJA ` Bs´1BrAJA ` Bs´1q.
Then, setting A “ MzERx{?n and B “ λD, by Assumption 3.3(iii) we have
rσn,λpt, sqs2 “ ϕKptqJΣ´1
n,r,λΩn,rpsqΣ´1
n,r,λϕKptq
ě c1 ¨ ϕKptqJ ”
ER
J
x MzERx{n ` λD
ı´1
pER
J
x MzERx{nq
”
ER
J
x MzERx{n ` λD
ı´1
ϕKptq
ě tc2{p1 ` λq ´ c3λ{pνKL ` λρDq2u ¨ ||ϕKptq||2
ě c4 ¨ ||ϕKptq||2
(A.11)
for a sufficiently large n under the assumption λ{ν2
KL Ñ 0, where the c1, . . . , c4 are some fixed constants.
We can write
?nppαnpt, sq ´ α0pt, sqq
σn,λpt, sq
“
?nϕKptqJpB1 ` B21 ` B22 ` B23 ` B3q
σn,λpt, sq
`
?npϕKptqJθ0psq ´ α0pt, sqq
σn,λpt, sq
.
Recalling that
||B1|| Àp
λ||θ0psq||D
νKL ` λρD
,
||B22|| Àp
K
?
L
npνKL ` λρDq2 ,
||B23|| Àp
L
?
K
npνKL ` λρDq,
||B3|| Àp
ℓKpsq
?νKL ` λρD
,
we can find that the dominant term of ?nppαnpt, sq ´ α0pt, sqq{σn,λpt, sq is ?nϕKptqJB21{σn,λpt, sq
34

considering (A.11) under the assumptions introduced here.
Let
bn :“ rσn,λpt, sqs´1ϕKptqJ Σ´1
n,r,λER
J
x MzEpsq{?n
?nB21
rbn,i :“ n´1{2rσn,λpt, sqs´1ϕKptqJΣ´1
n,r,λpER
J
x Z{nqpZJZ{nq´1ziεipsq
such that bn “ řn
i“1rbn,i, Erbn,i “ 0, and řn
i“1 Eprb2
n,iq “ 1 hold. Observe that
Eprb4
n,iq À
1
n2rσn,λpt, sqs4
´
zJ
i pZJZ{nq´1pZJERx{nqΣ´1
n,r,λϕKptqϕKptqJΣ´1
n,r,λpER
J
x Z{nqpZJZ{nq´1zi
¯2
À 1
n2
´
zJ
i pZJZ{nq´1pZJERx{nqΣ´2
n,r,λpER
J
x Z{nqpZJZ{nq´1zi
¯2
À ||zi||2
n2 zJ
i pZJZ{nq´1pZJERx{nqΣ´2
n,r,λpER
J
x MzERx{nqΣ´2
n,r,λpER
J
x Z{nqpZJZ{nq´1zi
À
||zi||2
n2pνKL ` λρDqzJ
i pZJZ{nq´1pZJERx{nqΣ´2
n,r,λpER
J
x Z{nqpZJZ{nq´1zi
À
||zi||2
n2pνKL ` λρDq2 zJ
i pZJZ{nq´1pZJERx{nqΣ´1
n,r,λpER
J
x Z{nqpZJZ{nq´1zi
À
||zi||2
n2pνKL ` λρDq2 zJ
i pZJZ{nq´1zi
À
L2
n2pνKL ` λρDq2
where we have used (A.8) with A “ MzERx{?n and B “ λD in the fourth inequality and (A.7) with
A “ pZJZ{nq´1{2pZJERx{nq and B “ λD in the sixth inequality. This suggests that řn
i“1 Eprb4
n,iq Ñ 0,
and the result follows from Lyapunov’s central limit theorem.
(iii), (iv) We only prove that ZJ pVnpsqZ{n converges in probability to ZJVnpsqZ{n; the consistency
of XJ pVnpsqX{n is analogous. The consistency of the other parts are already proved in the preceding
arguments. Let rVnpsq :“ diagtε2
1psq, . . . , ε2
npsqu. By the triangle inequality,
›››ZJ pVnpsqZ{n ´ ZJVnpsqZ{n
››› ď
›››ZJ ”
pVnpsq ´ rVnpsq
ı
Z{n
››› `
›››ZJ ”
rVnpsq ´ Vnpsq
ı
Z{n
››› .
Under Assumptions 3.3(ii) and (iii), by Markov’s inequality, it is easy to observe that the second term
on the right-hand side is of order L{?n under Assumption 3.3(iii).
Write pεipsq “ εipsq ` tn,r,ipsq ` tn,x,ipsq, where
tn,r,ipsq :“
ż 1
0
qiptqrα0pt, sq ´ qαnpt, sqsdt,
tn,x,ipsq :“ xJ
i pβ0psq ´ pβnpsqq,
where qαnpt, sq :“ ϕKptqJqθnpsq. By Theorem 3.1(i), we have |tn,x,ipsq| Àp n´1{2 uniformly in i. For
tn,r,ipsq, noting that |
ş1
0 qiptqrα0pt, sq ´ qαnpt, sqsdt| À ||α0p¨, sq ´ qαnp¨, sq||L2 by Cauchy–Schwarz in-
equality, Theorem 3.1(ii) gives |tn,r,ipsq| Àp
?
K{?nνKL uniformly in i. As pε2
i psq ´ ε2
i psq “ t2
n,r,ipsq `
35

t2
n,x,ipsq ` 2tn,r,ipsqεipsq ` 2tn,x,ipsqεipsq ` 2tn,r,ipsqtn,x,ipsq, we can decompose
ZJ ”
pVnpsq ´ rVnpsq
ı
Z{n “ γn1 ` γn2 ` 2γn3 ` 2γn4 ` 2γn5,
where γn1 :“ n´1 řn
i“1 zizJ
i t2
n,r,ipsq, γn2 :“ n´1 řn
i“1 zizJ
i t2
n,x,ipsq, γn3 :“ n´1 řn
i“1 zizJ
i εipsqtn,r,ipsq,
γn4 :“ n´1 řn
i“1 zizJ
i εipsqtn,x,ipsq, and γn5 :“ n´1 řn
i“1 zizJ
i tn,r,ipsqtn,x,ipsq.
Then, by Markov’s in-
equality, we have
||γn1|| Àp K
?
L{pnνKLq,
||γn2|| Àp
?
L{n,
||γn3|| Àp
?
KL{?nνKL
||γn4|| Àp
?
L{?n,
||γn5|| Àp
?
KL{pn?νKLq.
This completes the proof.
Lemma A.5. Under the assumptions made in Theorem 3.3, we have
nBJ
21ΦIB21 ´ µn
?vn
dÑ Np0, 1q.
Proof. Recalling that Ξn :“ Σ´1
n,r,λEpR
J
x ZqpZJZq´, observe
ErnBJ
21ΦIB21s “ ErEpsqJZpZJZq´EpZJRxqΣ´1
n,r,λΦIΣ´1
n,r,λEpR
J
x ZqpZJZq´ZJEpsqs{n
“ tr
␣
ErΞJ
nΦIΞnZJEpsqEpsqJZs{n
(
“ tr
␣
ΞJ
nΦIΞnpZJVnpsqZ{nq
(
“ µn
.
Letting πn,i,j :“ zJ
i ΞJ
nΦIΞnzj, we can write
nBJ
21ΦB21 “ 1
n
nÿ
i“1
nÿ
j“1
πn,i,jεipsqεjpsq
“ 2
n
ÿ
1ďiăjďn
πn,i,jεipsqεjpsq `
1
n
nÿ
i“1
πn,i,iε2
i psq
“ trtΞJ
nΦIΞnpZJ rVnpsqZ{nqu
.
Here, we have
||ΞJ
nΦIΞn||2 À tr
␣
ΞnΞJ
nΞnΞJ
n
(
À tr
!
Σ´1
n,r,λpER
J
x MzERx{nqΣ´1
n,r,λΣ´1
n,r,λpER
J
x MzERx{nqΣ´1
n,r,λ
)
À
K
pνKL ` λρDq2
by (A.8). Further,
ˇˇˇˇˇ
1
n
nÿ
i“1
πn,i,iε2
i psq ´ µn
ˇˇˇˇˇ “
ˇˇˇtr
!
ΞJ
nΦIΞn
´
ZJ ”
rVnpsq ´ Vnpsq
ı
Z{n
¯)ˇˇˇ
ď
››ΞJ
nΦIΞn
›› ¨
›››ZJ ”
rVnpsq ´ Vnpsq
ı
Z{n
›››
36

Àp
L
?
K
?npνKL ` λρDq
as in the proof of Theorem 3.2(iii), (iv).
Meanwhile, for a sufficiently large n,
vn “ 2tr
␣
ΞJ
nΦIΞnpZJVnpsqZ{nqΞJ
nΦIΞnpZJVnpsqZ{nq
(
ě c1tr
␣
pZJVnpsqZ{nqΞJ
nΞnpZJVnpsqZ{nqΞJ
nΞn
(
ě c2tr
␣
ΞnpZJZ{nqΞJ
nΞnpZJZ{nqΞJ
n
(
“ c2tr
!
Σ´1
n,r,λpER
J
x MzERx{nqΣ´1
n,r,λΣ´1
n,r,λpER
J
x MzERx{nqΣ´1
n,r,λ
)
ě tc3{p1 ` λq ´ c4λ{pνKL ` λρDq2u2K
ě c5K ą 0
(A.12)
for some constants c1, . . . , c5, by Assumptions 3.3(iii) and 3.7(ii) and (A.10). These imply that
1
n
řn
i“1 πn,i,iε2
i psq ´ µn
?vn
Àp
L
?npνKL ` λρDq Ñ 0.
Hence, we have
nBJ
21ΦIB21 ´ µn
?vn
“
ÿ
1ďiăjďn
ζn,i,j ` oP p1q,
where ζn,i,j :“ pn?vnq´12πn,i,jεipsqεjpsq.
To derive the limiting distribution of ř
1ďiăjďn ζn,i,j, we can use the central limit theorem for
quadratic forms developed by de Jong (1987). From Proposition 3.2 of de Jong (1987), if (1) Varpř
1ďiăjďn ζn,i,jq “
1 ` op1q, (2) Gn,I “ op1q, (3) Gn,II “ op1q, and (4) Gn,IV “ op1q, we have ř
1ďiăjďn ζn,i,j
dÑ Np0, 1q,
where
Gn,I :“
ÿ
1ďiăjďn
Epζ4
n,i,jq
Gn,II :“
ÿ
1ďiăjăkďn
Epζ2
n,i,jζ2
n,ik ` ζ2
n,j,iζ2
n,jk ` ζ2
n,k,iζ2
n,k,jq
Gn,IV :“
ÿ
1ďiăjăkălďn
Epζn,ijζn,i,kζn,ljζn,l,k ` ζn,i,jζn,i,lζn,k,jζn,k,l ` ζn,i,kζn,i,lζn,j,kζn,j,lq.
For (1), observe that
Var
˜
2
n
ÿ
1ďiăjďn
πn,i,jεipsqεjpsq
¸
“ 4
n2
ÿ
1ďiăjďn
ÿ
1ďkălďn
πn,i,jπn,k,lErεipsqεjpsqεkpsqεlpsqs
“ 4
n2
ÿ
1ďiăjďn
π2
n,i,jErε2
i psqsErε2
jpsqs
37

“ 2
n2
ÿ
i‰j
tr
␣
ΞJ
nΦIΞnzjzJ
j ΞJ
nΦIΞnzizJ
i
(
Erε2
i psqsErε2
jpsqs
“ vn ´ 2
n2
nÿ
i“1
tr
␣
ΞJ
nΦIΞnzizJ
i ΞJ
nΦIΞnzizJ
i
( `
Erε2
i psqs
˘2 .
By easy calculation, we can find
2
n2
nÿ
i“1
tr
␣
ΞJ
nΦIΞnzizJ
i ΞJ
nΦIΞnzizJ
i
( `
Erε2
i psqs
˘2 À
KL
npνKL ` λρDq2 .
Then, by (A.12), Varpř
1ďiăjďn ζn,i,jq Ñ 1.
(2), (3), and (4) can be verified in the same manner as in the proof of Lemma A.11 of Hoshino
(2022). Indeed, the following results hold:
Gn,I À L3{pn2K2rνKL ` λρDs4q ` L4{pn3K2rνKL ` λρDs4q
Gn,II À L2{pnK2rνKL ` λρDs4q ` L3{pn2K2rνKL ` λρDs4q ` L4{pn3K2rνKL ` λρDs4q
Gn,IV À 1{pKrνKL ` λρDs2q ` L4{pnK2rνKL ` λρDs4q ` L4{pn2K2rνKL ` λρDs4q ´ Gn,II.
This completes the proof.
Proof of Theorem 3.3
Our test statistic is defined as a standardization of Tn “ n
ş
I pα2
npt, sqdt. Trivially, under H0, we can
write Tn “ n
ş
Ippαnpt, sq ´ α0pt, sqq2dt. In view of (A.12), if we can verify that
Tn ´ nBJ
21ΦIB21 “ oP p
?
Kq,
the proof is completed by Lemma A.5.
Observe that
Tn “ n
ż
I
ppαnpt, sq ´ α0pt, sqq2dt
“ n
ż
I
pϕKptqJrpθnpsq ´ θ0psqs ` rϕKptqJθ0psq ´ α0pt, sqsq2dt
“ nrpθnpsq ´ θ0psqsJΦIrpθnpsq ´ θ0psqs ` 2n
ż
I
ϕKptqJrpθnpsq ´ θ0psqsrϕKptqJθ0psq ´ α0pt, sqsdt
` n
ż
I
pϕKptqJθ0psq ´ α0pt, sqq2dt
ď ℓ2
Kpsq
.
By Cauchy–Schwarz inequality,
ˇˇˇˇ
ż
I
ϕKptqJrpθnpsq ´ θ0psqsrϕKptqJθ0psq ´ α0pt, sqsdt
ˇˇˇˇ
ď ℓKpsq
›››ϕKptqJrpθnpsq ´ θ0psqs
›››
L2 Àp
ℓKpsq
?
K
a
npνKL ` λρDq
`
ℓ2
Kpsq
?νKL ` λρD
` λℓKpsq||θ0psq||D
νKL ` λρD
38

as in the proof of Theorem 3.1(ii).
Next, using the decomposition in the proof of Theorem 3.1(ii), write
nrpθnpsq ´ θ0psqsJΦIrpθnpsq ´ θ0psqs “ n
3ÿ
a“1
3ÿ
b“1
BJ
a ΦIBb.
We can easily observe the following results:
ˇˇBJ
1 ΦIB1
ˇˇ Àp
λ2||θ0psq||2
D
pνKL ` λρDq2 ,
ˇˇBJ
3 ΦIB3
ˇˇ Àp
ℓ2
Kpsq
νKL ` λρD
,
ˇˇBJ
1 ΦIB3
ˇˇ Àp
λℓKpsq||θ0psq||D
pνKL ` λρDq3{2 .
Moreover, write BJ
1 ΦIB2 “ BJ
1 ΦIpB21 ` B22 ` B23q. By similar calculations as above,
ˇˇBJ
1 ΦIB21
ˇˇ “ λ
ˇˇˇθ0psqJDpΣ´1
n,r,λΦIΣ´1
n,r,λER
J
x MzEpsq{n
ˇˇˇ
ď λ
ˇˇˇθ0psqJDΣ´1
n,r,λΦIΣ´1
n,r,λER
J
x MzEpsq{n
ˇˇˇ ` λ
ˇˇˇθ0psqJD
!
pΣ´1
n,r,λ ´ Σ´1
n,r,λ
)
ΦIΣ´1
n,r,λER
J
x MzEpsq{n
ˇˇˇ
Àp
λ||θ0psq||D
?npνKL ` λρDq3{2 ` λ||θ0psq||DK
?
L
npνKL ` λρDq5{2 .
Similarly,
ˇˇBJ
1 ΦIB22
ˇˇ “ λ
ˇˇˇθ0psqJDpΣ´1
n,r,λΦI
!
pΣ´1
n,r,λ ´ Σ´1
n,r,λ
)
ER
J
x MzEpsq{n
ˇˇˇ Àp
λ||θ0psq||DK
?
L
npνKL ` λρDq3 .
and
ˇˇBJ
1 ΦIB23
ˇˇ “ λ
ˇˇˇˇθ0psqJDpΣ´1
n,r,λΦI pΣ´1
n,r,λ
!
pR
J
x Z{nq ´ EpR
J
x Z{nq
)J
pZJZ{nq´ZJEpsq{n
ˇˇˇˇ
Àp
λ||θ0psq||DL
?
K
npνKL ` λρDq2 .
We can also observe that
ˇˇBJ
3 ΦIB21
ˇˇ Àp
ℓKpsq
?npνKL ` λρDq `
ℓKpsqK
?
L
npνKL ` λρDq5{2 `
ℓKpsqK
?
L
npνKL ` λρDq3{2
ˇˇBJ
3 ΦIB22
ˇˇ Àp
ℓKpsqK
?
L
npνKL ` λρDq5{2 ,
ˇˇBJ
3 ΦIB23
ˇˇ Àp
ℓKpsqL
?
K
npνKL ` λρDq3{2
ˇˇBJ
21ΦIB22
ˇˇ Àp
K
?
KL
n3{2pνKL ` λρDq5{2 ,
ˇˇBJ
21ΦIB23
ˇˇ Àp
KL
n3{2pνKL ` λρDq3{2
ˇˇBJ
22ΦIB22
ˇˇ Àp
K2L
n2pνKL ` λρDq4 ,
ˇˇBJ
23ΦIB23
ˇˇ Àp
KL2
n2pνKL ` λρDq2 ,
ˇˇBJ
22ΦIB23
ˇˇ Àp
pKLq3{2
n2pνKL ` λρDq3 .
Under the assumptions made, we can find that Tn “ nBJ
21ΦIB21 ` opp
?
Kq, as desired.
Proof of Theorem 3.4
39

Observe that
pQpsq “ Wn
ż 1
0
Qptqα0pt, sqdt ` Xβ0psq ` Epsq ` Epsq
“ Wn
ż 1
0
pQptqα0pt, sqdt ´ Wn
ż 1
0
Eptqα0pt, sqdt ` Xβ0psq ` Epsq ` Epsq
“ pRθ0psq ` Xβ0psq ` Epsq ` Epsq ´ V psq ` pUpsq,
where E “ pe1, . . . , enqJ, ei :“ pqi ´ qi, V psq “ pv1psq, . . . , vnpsqqJ, vipsq :“ řn
j“1 wi,j
ş1
0 ejptqα0pt, sqdt,
pUpsq “ ppu1psq, . . . , punpsqqJ, and puipsq :“ řn
j“1 wi,j
ş1
0 pqjptqα0pt, sqdt´prJ
i θ0. As shown in (A.4), |ei| À κξ
uniformly. From this, |vipsq| À κξ is straightforward. Further, similar to (A.5), we have |puipsq| À ℓKpsq
uniformly. Write
rβnpsq ´ β0psq “
”
XJpIn ´ pSqX
ı´1
XJpIn ´ pSq pQpsq ´ β0psq
“
”
XJpIn ´ pSqX
ı´1
XJpIn ´ pSqrEpsq ` pUpsq ` Epsq ´ V psqs
“: A1
1 ` A1
2 ` A1
3 ` A1
4, say.
Applying Fact A.2 in Hoshino (2022) and Lemmas A.3(ii) and A.4(iv), we have
››››XJ pSX{n ´ EpXJR{nq
”
ER
JMzER{n
ı´1
EpR
JX{nq
››››
À ρmax
´
p pRJX{nqpXJ pR{nq
¯ ››››
”
pRJMz pR{n
ı´1
´
”
ER
JMzER{n
ı´1››››
` ρmax
ˆ”
ER
JMzER{n
ı´1˙ ››› pRJX{n ´ EpR
JX{nq
››› Àp
?
KL
ν2
KL
?n ` κξ?
KL
ν2
KL
`
?
K
νKL
?n ` κξ?
K
νKL
.
This implies that ||XJpIn ´ pSqX{n´Σn,x|| Àp
?
KL{pν2
KL
?nq and that ρ
´
XJpIn ´ pSqX{n
¯
ą 0 with
probability approaching one. Then, by the same argument as in the proof of Theorem 3.1(i), we can
easily observe that ||A1
2|| Àp ℓKpsq, ||A1
3|| Àp κξ, and ||A1
4|| Àp κξ hold.
For A1
1, decompose A1
1 “ A1 ` A1
12 ` A1
13, where
A1 “
“
XJpIn ´ SqX{n
‰´1 XJpIn ´ SqEpsq{n
A1
12 “
“
XJpIn ´ SqX{n
‰´1 XJpS ´ pSqEpsq{n
A1
13 “
"”
XJpIn ´ pSqX{n
ı´1
´
“
XJpIn ´ SqX{n
‰´1*
XJpIn ´ pSqEpsq{n.
Write
XJpS ´ pSqEpsq{n
“ pXJZ{nqpZJZ{nq´␣
pZJR{nqrR
JMzR{ns´1pR
JZ{nq ´ pZJ pR{nqr pRJMz pR{ns´1p pRJZ{nq
“: L
(
pZJZ{nq´ZJEpsq{n.
40

By the same argument as above, we observe that
}L} À ρmax
´
pR
JZ{nqpZJR{nq
¯ ›››rR
JMzR{ns´1 ´ r pRJMz pR{ns´1›››
` ρmax
ˆ”
pRJMz pR{n
ı´1˙ ›››R
JZ{n ´ pRJZ{n
››› Àp
κξ?
KL
ν2
KL
` κξ?
KL
νKL
.
Thus,
||A1
12||2 “ EpsqJpS ´ pSqX
“
XJpIn ´ SqX{n
‰´2 XJpS ´ pSqEpsq{n2
Àp pEpsqJZ{nqpZJZ{nq´LpZJZ{nq´pZJX{nqpXJZ{nqpZJZ{nq´LpZJZ{nq´pZJEpsq{nq
Àp ||L||2 ¨ ||ZJEpsq{n||2,
which yields ||A1
12|| Àp κξL
?
K{p?nν2
KLq. Similarly,
||A1
13|| Àp
››››
”
XJpIn ´ pSqX{n
ı´1
´
“
XJpIn ´ SqX{n
‰´1
››››
ˆ
!
||XJEpsq{n|| ` ||XJZpZJZq´Z pRr pRJMz pRs´1 pRJZpZJZq´ZJEpsq{n||
)
Àp
κξ?
KL
?nν2
KL
` κξL
?
K
?nν2
KL
.
Combining all these results, we have
rβnpsq ´ β0psq ´ A1 Àp
κξL
?
K
?nν2
KL
` ℓKpsq ` κξ “ oP pn´1{2q,
which implies that rβnpsq and pβnpsq have the same asymptotic distribution.
Next, similar to the above discussion, decompose
rθnpsq ´ θ0psq “
”
pRJ
x Mz pRx ` λDn
ı´1 pRJ
x Mz pQpsq ´ θ0psq
“
”
pRJ
x Mz pRx ` λDn
ı´1 pRJ
x Mz pRθ0psq ´ θ0psq
`
”
pRJ
x Mz pRx ` λDn
ı´1 pRJ
x MzrEpsq ` pUpsq ` Epsq ´ V psqs
“: B1
1 ` B1
2 ` B1
3 ` B1
4 ` B1
5, say.
By Lemma A.4(iii), we have ρmaxpr pRJ
x Mz pRx{n ` λDs´1q Àp pνKL ` λρDq´1. Then, for B1
1, noting that
B1
1 “ ´λr pRJ
x Mz pRx{n ` λDs´1Dθ0psq, we have
||B1
1|| Àp
λ||θ0psq||D
νKL ` λρD
.
Additionally, we can easily find that ||B1
3|| Àp ℓKpsq{?νKL ` λρD, ||B1
4|| Àp κξ{?νKL ` λρD, and
||B1
5|| Àp κξ{?νKL ` λρD hold.
41

For B1
2, decompose it further as B1
2 “ B2 ` B1
22 ` B1
23, where
B2 “
”
R
J
x MzRx{n ` λD
ı´1
R
J
x MzEpsq{n
B1
22 “
”
R
J
x MzRx{n ` λD
ı´1
p pRx ´ RxqJMzEpsq{n
B1
23 “
"”
pRJ
x Mz pRx{n ` λD
ı´1
´
”
R
J
x MzRx{n ` λD
ı´1*
pRJ
x MzEpsq{n.
By Lemma A.3(ii), we have
||B1
22|| “
››››
”
R
J
x MzRx{n ` λD
ı´1
p pRZ{n ´ RZ{nqJpZJZ{nq´ZJEpsq{n
››››
`
››››
”
R
J
x MzRx{n ` λD
ı´1
p pRX{n ´ RX{nqJpXJX{nq´1XJEpsq{n
››››
Àp
κξL
?
K
?npνKL ` λρDq `
κξ?
K
?npνKL ` λρDq.
It also holds that
||B1
23|| ď
››››
”
pRJ
x Mz pRx{n ` λD
ı´1
´
”
R
J
x MzRx{n ` λD
ı´1›››› ¨
››› pRJ
x MzEpsq{n
›››
Àp
κξL
?
K
?npνKL ` λρDq2 .
Hence, we have
?nprαnpt, sq ´ α0pt, sq ´ ϕKptqJB2q
σn,λpt, sq
“
?nϕKptqJpB1
1 ` B1
22 ` B1
23 ` B1
3 ` B1
4 ` B1
5q
σn,λpt, sq
`
?npϕKptqJθ0psq ´ α0pt, sqq
σn,λpt, sq
Àp
?nλ||θ0psq||D
νKL ` λρD
`
κξL
?
K
pνKL ` λρDq2 `
?npℓKpsq ` κξq
?νKL ` λρD
`
?n|ϕKptqJθ0psq ´ α0pt, sq|
||ϕKptq||
“ op1q,
implying the desired result.
B
Supplementary simulation results
This section provides the detailed simulation results under incompletely observed outcome functions.
As described in the main text, we suppose that for each unit i we can observe tpsi,j, qipyi,jqqum
j“1, where
si,j’s are uniformly randomly drawn from r0, 1s, for m P t15, 50u. To recover the entire functional form
of the qi function, we apply the linear interpolation method in (3.2). The simulation scenarios examined
are all identical to those used in the main text. The results for the 2SLS estimator and the Wald test
are summarized in Tables B1 and B2, respectively.
42

Table B1: Estimation performance
β
α (λc “ 0.5)
α (λc “ 1)
α (λc “ 2)
α (λc “ 3)
DGP
m
n
# knots
BIAS
RMSE
BIAS
RMSE
BIAS
RMSE
BIAS
RMSE
BIAS
RMSE
1
15
400
2
0.0002
0.0376
0.0214
0.1071
0.0228
0.1036
0.0203
0.1022
0.0162
0.1010
3
0.0004
0.0385
0.0221
0.1199
0.0230
0.1157
0.0197
0.1136
0.0149
0.1119
1600
2
0.0008
0.0184
0.0161
0.0974
0.0201
0.0962
0.0223
0.0978
0.0220
0.0991
3
0.0007
0.0184
0.0170
0.1118
0.0208
0.1097
0.0226
0.1105
0.0220
0.1112
50
400
2
-0.0010
0.0370
0.0202
0.1060
0.0214
0.1025
0.0189
0.1010
0.0148
0.0997
3
-0.0008
0.0372
0.0208
0.1187
0.0217
0.1147
0.0184
0.1125
0.0136
0.1108
1600
2
-0.0004
0.0176
0.0152
0.0959
0.0190
0.0951
0.0211
0.0968
0.0208
0.0980
3
-0.0004
0.0177
0.0161
0.1102
0.0197
0.1086
0.0214
0.1094
0.0208
0.1102
2
15
400
2
-0.0001
0.0377
-0.0023
0.0917
-0.0065
0.0829
-0.0134
0.0769
-0.0196
0.0742
3
0.0002
0.0388
-0.0023
0.1074
-0.0067
0.0990
-0.0143
0.0931
-0.0211
0.0903
1600
2
0.0007
0.0184
-0.0001
0.0848
-0.0024
0.0808
-0.0060
0.0774
-0.0091
0.0757
3
0.0007
0.0184
0.0001
0.1015
-0.0023
0.0976
-0.0061
0.0942
-0.0094
0.0925
50
400
2
-0.0012
0.0375
-0.0011
0.0921
-0.0055
0.0836
-0.0125
0.0775
-0.0188
0.0748
3
-0.0012
0.0376
-0.0011
0.1078
-0.0058
0.0996
-0.0134
0.0937
-0.0203
0.0908
1600
2
-0.0005
0.0179
0.0013
0.0851
-0.0012
0.0813
-0.0049
0.0780
-0.0081
0.0763
3
-0.0005
0.0181
0.0014
0.1018
-0.0011
0.0981
-0.0051
0.0948
-0.0085
0.0930
3
15
400
2
-0.0004
0.0401
0.0074
0.1841
0.0100
0.1964
0.0076
0.2076
0.0027
0.2125
3
-0.0004
0.0410
0.0079
0.1870
0.0099
0.2008
0.0065
0.2125
0.0008
0.2172
1600
2
0.0007
0.0195
0.0006
0.1559
0.0061
0.1708
0.0096
0.1885
0.0098
0.1977
3
0.0007
0.0197
0.0016
0.1552
0.0067
0.1732
0.0098
0.1927
0.0094
0.2024
50
400
2
-0.0014
0.0395
0.0142
0.1823
0.0171
0.1946
0.0148
0.2061
0.0100
0.2112
3
-0.0014
0.0399
0.0148
0.1852
0.0170
0.1993
0.0138
0.2116
0.0081
0.2165
1600
2
-0.0005
0.0192
0.0071
0.1545
0.0127
0.1682
0.0165
0.1862
0.0168
0.1957
3
-0.0005
0.0192
0.0081
0.1533
0.0134
0.1707
0.0167
0.1907
0.0165
0.2009
C
Supplementary material for the empirical analysis in Section 5
This section provides supplementary material for the empirical analysis. Table C3 presents the defini-
tions of the variables used and their summary statistics. The estimated functional coefficients and the
95% confidence intervals are shown in Figure C1.
43

Table B2: Rejection frequency
ϱ “ 0
ϱ “ 0.1
ϱ “ 0.2
n
# knots
m
λc
10%
5%
1%
10%
5%
1%
10%
5%
1%
400
2
15
0.5
0.071
0.035
0.016
0.261
0.194
0.105
0.967
0.939
0.829
1
0.063
0.035
0.020
0.710
0.608
0.440
1.000
0.998
0.992
2
0.056
0.032
0.015
0.946
0.921
0.861
1.000
1.000
1.000
3
0.051
0.035
0.019
0.974
0.959
0.921
1.000
1.000
1.000
50
0.5
0.079
0.049
0.018
0.298
0.210
0.110
0.983
0.967
0.880
1
0.077
0.048
0.019
0.737
0.641
0.469
1.000
0.998
0.995
2
0.069
0.040
0.019
0.969
0.945
0.903
1.000
1.000
1.000
3
0.062
0.044
0.023
0.981
0.975
0.955
1.000
1.000
1.000
3
15
0.5
0.068
0.034
0.016
0.242
0.178
0.096
0.955
0.919
0.797
1
0.060
0.034
0.016
0.690
0.590
0.430
1.000
0.998
0.991
2
0.052
0.031
0.013
0.941
0.912
0.844
1.000
1.000
1.000
3
0.045
0.033
0.016
0.964
0.950
0.917
1.000
1.000
1.000
50
0.5
0.073
0.046
0.018
0.292
0.205
0.105
0.980
0.959
0.872
1
0.072
0.044
0.018
0.741
0.650
0.479
1.000
0.999
0.995
2
0.061
0.036
0.017
0.965
0.941
0.901
1.000
1.000
1.000
3
0.061
0.041
0.023
0.979
0.970
0.955
1.000
1.000
1.000
1600
2
15
0.5
0.058
0.043
0.023
0.505
0.364
0.188
0.998
0.996
0.991
1
0.057
0.043
0.023
0.936
0.875
0.692
1.000
1.000
0.999
2
0.063
0.040
0.022
0.997
0.997
0.990
1.000
1.000
1.000
3
0.064
0.041
0.022
0.999
0.999
0.998
1.000
1.000
1.000
50
0.5
0.078
0.054
0.028
0.611
0.449
0.243
1.000
1.000
1.000
1
0.078
0.053
0.029
0.970
0.942
0.791
1.000
1.000
1.000
2
0.077
0.052
0.028
1.000
1.000
0.998
1.000
1.000
1.000
3
0.078
0.052
0.026
1.000
1.000
1.000
1.000
1.000
1.000
3
15
0.5
0.056
0.041
0.023
0.501
0.353
0.176
0.999
0.996
0.990
1
0.055
0.041
0.022
0.940
0.869
0.695
1.000
1.000
0.999
2
0.062
0.039
0.021
0.998
0.997
0.990
1.000
1.000
1.000
3
0.061
0.040
0.020
1.000
0.999
0.998
1.000
1.000
1.000
50
0.5
0.074
0.051
0.027
0.605
0.439
0.236
1.000
1.000
1.000
1
0.074
0.050
0.028
0.972
0.942
0.808
1.000
1.000
1.000
2
0.076
0.050
0.028
1.000
1.000
0.999
1.000
1.000
1.000
3
0.077
0.049
0.026
1.000
1.000
1.000
1.000
1.000
1.000
Table C3: Descriptive statistics (n “ 1883)
Variable
Mean
Std. Dev.
Min.
Max.
Landprice
10.083
1.191
7.313
14.883
Unemployment
3.687
1.134
0
10.635
Agriculture
0.078
0.081
0
0.467
Sales
10.500
2.196
0
17.666
Beds
1.089
1.051
0
13.489
Childcare
0.290
0.209
0.000
2.833
qp0.25q
31.801
5.638
17.904
58.162
qp0.5q
52.777
5.865
38.113
71.003
qp0.75q
69.895
3.877
54.193
99.028
Definitions: Landprice = log(average residential landprice (JPY/m2)); Unemployment =
unemployment rate (%); Agriculture = proportion of agriculture, forestry, and fishery work-
ers; Sales = log(annual commercial sales (million JPY) + 1); Beds = 100 ˆ # of hospital
beds/population; Childcare = 1000 ˆ # of childcare facilities/population.
44

Figure C1: Estimated coefficients
45
