\section{Introduction}
%Talk about PCA/streaming PCA ...
Principal Component Analysis (PCA)~\citep{pearson1901liii, ziegel2003principal} is a cornerstone for statistical data analysis and visualization. Given a dataset $\{X_i\}_{i=1}^{n}$, where each $X_i \in \mathbb{R}^d$ is independently drawn from a distribution $\mathcal{P}$ with mean zero and covariance matrix $\Sigma$, PCA computes the eigenvector $v_1$ of $\Sigma$ that corresponds to the largest eigenvalue $\lambda_1$, and is the direction that explains the most variance in the data. It has been established~\citep{wedin1972perturbation, jain2016streaming, vershynin2010introduction} that the leading eigenvector $\hat{v}$ of the empirical covariance matrix $\hat{\Sigma} = \frac{1}{n} \sum_{i=1}^n X_iX_i^{\top}$ is a nearly optimal estimator of $v_1$ under suitable assumptions on  the data distribution. 

While theoretically appealing, computing the empirical covariance matrix $\hat{\Sigma}$ explicitly requires $O(d^2)$ time and space, which is expensive in high-dimensional settings when both the sample size and the dimension are large. Oja’s algorithm~\citep{oja1985stochastic}--- a streaming algorithm inspired by Hebbian learning~\citep{hebb2005organization}--- has emerged as an efficient and scalable algorithm for PCA. It maintains a running estimate of $v_1$ similar to a projected stochastic gradient descent (SGD) update
\begin{gather}\label{eq:ojaupdate}
u_i \gets u_{i-1} +\eta_n X_i(X_i^T u_{i-1}), \;\;
u_i \gets \frac{u_i}{\norm{u_i}_{2}}
\end{gather}
for $i \in [n]$, where $u_0$ is a random unit vector and $\eta_{n} > 0$ is the learning rate. The algorithm is single-pass, runs in time $\mathcal{O}(nd)$, and takes only $\mathcal{O}(d)$ space. We call the output $u_n$ of the above algorithm an \textit{Oja vector }$\voja$.

Oja's algorithm has fueled significant research in theoretical statistics, applied mathematics, and computer science~\citep{jain2016streaming, allenzhu2017efficient, chen2018dimensionality, yang2018history, henriksen2019adaoja, mouzakis2022spectral, lunde2021bootstrapping, monnez2022stochastic, DBLP:journals/corr/abs-2102-03646, kumarsarkar2024markovoja, kumarsarkar2024sparse}.  %, it maintains competitive error guarantees while processing data sequentially. These advantages have fueled significant research interest in the theoretical statistics and computer science communities. 
Despite the plethora of work on sharp rates for the sin-squared error $\sin^2 \bb{\voja, v_1} := 1-(v_1^T\voja)^2$, entrywise uncertainty estimation for streaming PCA has received only limited attention. Since the update rule in Oja's algorithm is similar to a broad class of important non-convex problems, uncertainty estimation for Oja's algorithm has potential implications for matrix sensing~\citep{jain2013matcompl}, matrix completion~\citep{jain2013matcompl,keshavan2010completion}, subspace estimation~\citep{pmlr-v151-balzano22a}, and subspace tracking~\citep{balzano2010sstracking}. A notable exception is~\cite{lunde2021bootstrapping}, who show that $\sin^2 \bb{\voja, v_1} := 1-(v_1^T\voja)^2$ behaves asymptotically like a high-dimensional weighted chi-squared random variable. A main ingredient in their analysis is the Hoeffding decomposition of the matrix product $B_n$. Their method takes $O(bnd)$ time and $O(bd)$ space, where $b$ is the number of bootstrap replicas.
While \cite{lunde2021bootstrapping} do uncertainty estimation of the $\sin^2$ error, we are interested in coordinate-wise uncertainty estimation.
% While \cite{lunde2021bootstrapping} provides deviation bounds in $\ell_2$ norm, we are interested in coordinate-wise deviation bounds or uncertainty estimation. 

In contrast, in offline eigenvector analysis, there has been a surge of interest for \textit{two-to-infinity} ($\ell_{2\rightarrow \infty}$) error bounds for empirical eigenvectors and singular vectors of random matrices~\citep{eldridge2018unperturbed,Mao02102021,abbe2020entrywise,cape2017singular,abbe2022lptheory,cape2019signal}. However, none of these apply directly to the matrix product structure that arises from the Oja update in Eq~\eqref{eq:ojaupdate}. Recent advances on the concentration of matrix products~\citep{huang2022matrix,kathuria2020concentration} only provide operator norm or the $\ell_q$ moment of the Schatten norm of the deviation of a matrix product and do not provide non-trivial guarantees on the coordinates.

\textbf{Our contributions:} 

In this paper, we obtain \textit{finite sample and high probability deviation bounds} for elements of $\voja$.

1. We show that the deviation of the elements of $\voja$ is governed by a suitably defined limiting covariance matrix $\V$. Furthermore, for a subset $K$ of $[d]$ of interest, the distribution of the coordinate $\voja(k)$, when suitably centered and rescaled, is asymptotically normal with variance $\V_{kk}$.

2. We provide a sharp Bernstein-type concentration bound to show that \textit{uniformly over entries of $\voja$}, $\forall \; k \in [d],$
\ba{
   |e_k^{\top} (\underbrace{\voja -(v_1^T \voja) v_1}_{:= \roja})| = \tilde{O}\bb{\sqrt{\frac{\V_{kk}}{n}}}. \label{eq:per_coord_bound}
}
where $e_k$ denotes the $k^{\text{th}}$ standard basis vector. This is a surprising and sharp result because it can be used (see Lemma~\ref{lemma:entrywise_to_sin_squared}) to recover the optimal $\sin^2$ error up to logarithmic factors with high probability.

3. We provide an algorithm that couples a subsampling-based $O(nd)$ time and $O(d \log (d/\delta))$ space algorithm with Median of Means~\citep{nemirovskij1983problem} to estimate the marginal variances of the elements of $\roja:=\voja -(v_1^T \voja) v_1$. Theorem~\ref{thm:high_prob_error_bound} provides high-probability error bounds of our variance estimator \textit{uniformly} over $\forall k \in [d]$.

4. We present numerical experiments on synthetic and real-world data to show the empirical performance of our algorithm and also compare it to the multiplier bootstrap algorithm in~\cite{lunde2021bootstrapping} to show that our estimator achieves similar accuracy in significantly less time.


The paper is organized as follows: Section~\ref{ssec:related_work} discusses related work on streaming PCA, entrywise error bounds on eigenvectors, and statistical inference for Stochastic Gradient Descent. Section~\ref{sec:prelim} provides our problem setup, assumptions, and necessary preliminaries. Section~\ref{sec:main_results} provides our main results regarding entrywise concentration, CLT and our variance estimation algorithm, Algorithm~\ref{alg:variance_estimation}. We provide proof sketches in Section~\ref{sec:proof_techiniques} and experiments in Section~\ref{sec:experiments}.

\subsection{Related Work}\label{ssec:related_work}
%The majority of work in streaming principal component analysis has been centered around obtaining sharp error rates for the sin-squared error ~\cite{jain2016streaming, allenzhu2017efficient, chen2018dimensionality, yang2018history, henriksen2019adaoja, mouzakis2022spectral, lunde2021bootstrapping, monnez2022stochastic, DBLP:journals/corr/abs-2102-03646, kumarsarkar2024markovoja, kumarsarkar2024sparse} under different distributional assumptions. It has been shown that 
% \bas{
% 1-(v_1^T\voja)^2=O\bb{\frac{\Nu}{n(\lambda_1-\lambda_2)^2}}
% }
% where $\Nu$ is a variance parameter~\eqref{eq:Nu_assumptions}.
\textbf{Streaming PCA.} A crucial measure of performance for Oja’s algorithm is the $\sin^2$ error, which quantifies the discrepancy between the estimated direction and the principal eigenvector of $\Sigma$ (the true population eigenvector, $v_1$) and the Oja vector, $\voja$. Notably, several studies~\citep{jain2016streaming, allenzhu2017efficient, DBLP:journals/corr/abs-2102-03646} have shown that Oja’s algorithm attains the same error as its offline counterpart, which computes the leading eigenvector of the empirical covariance matrix directly. More concretely, it has been shown that for an appropriately defined variance parameter $\Nu$ (equation~\eqref{eq:Nu_assumptions}),
\bas{
\sin^2(v_1, \voja) \defeq 1-(v_1^T\voja)^2=O\bb{\frac{\Nu}{n(\lambda_1-\lambda_2)^2}}.
}

\textbf{$\ell_\infty$ error bounds.} There is an extensive body of research on eigenvector perturbations of matrices. Most traditional bounds~\citep{davis1970rotation,wedin1972perturbation,stewart1990matrix}  measure error using the $\ell_2$ norm or other unitarily invariant norms. However, for machine learning and statistics applications, element-wise error bounds provide a better idea about the error in the estimated projection of \textit{a feature} in a given direction. This area has recently gained traction for random matrices.~\cite{eldridge2018unperturbed,abbe2020entrywise,cape2017singular,abbe2022lptheory} provide $\ell_{2\rightarrow \infty}$ bounds for eigenvectors and singular vectors of random matrices with low-rank structure.~\cite{cape2017singular} show an $\ell_{2\rightarrow \infty}$ norm for the error of the singular vectors of a covariance matrix formed by $n$ \iid Gaussian vectors; as long as $\lambda_1-\lambda_2>0$ and $v_1$ satisfies certain incoherence conditions, there exists a $w \in \{-1,1\}$ such that with probability $1-d^{-2}$, the top eigenvector $\hat{v}_1$ of the sample covariance matrix satisfies, up to logarithmic factors, 
\bas{
    \|v_1-w \hat{v}_1\|_\infty
    &\lesssim \bk
    \sqrt{\frac{\Tr\bb{\Sigma}/\lambda_{1}}{n}}\bb{\frac{\max_i\sqrt{\Sigma_{ii}}}{\sqrt{\lambda_1}}+\frac{\lambda_{2}}{\lambda_{1}}}  \\
    & \;\;\;\;\;\;\;\; +  \frac{\Tr\bb{\Sigma}/\lambda_1}{n} \bb{\frac{1}{\sqrt{d}} + \sqrt{\frac{\lambda_{2}}{\lambda_{1}}}}. 
}
The guarantees of~\cite{cape2017singular} are offline and provide a common upper bound on all coordinates. Our algorithm has error guarantees that scale with the variances of the coordinates. % \rd runs in time $O(nd)$, and takes space $\tilde{O}(d)$.\bk

\textbf{Uncertainty estimation for SGD.} %In Stochastic Gradient Descent (SGD) literature~\cite{robbins1951,Ruppert1988EfficientEF,polyak1992sgd,nemirovsky2009,bach2011sgd}, there has been a lot of interest towards uncertainty estimation. 
For convex loss functions, the foundational work of~\cite{polyak1992sgd,ruppert1988efficient,SGD_bather1989stochastic} in Stochastic Gradient Descent (SGD) demonstrates that averaged SGD iterates are asymptotically Gaussian. A significant body of research has focused on the convex setting. These include notable works on covariance matrix estimation~\citep{SGD_conf/aaai/LiLKC18,su2018uncertainty,SGD_JMLR:v19:17-370,chen2020SGD,SGD_lee2022fast, zhu2023online}.
In comparison, work on uncertainty estimation for nonconvex loss functions is relatively few~\citep{yu2020nonconvexAnalysis,zhong2023online}. ~\citet{yu2020nonconvexAnalysis} establishes a Central Limit Theorem (CLT) under relaxations of strong convexity assumptions.~\citet{zhong2023online} weakens the conditions but relies on online multiplier bootstrap methods to estimate the asymptotic covariance matrix. Existing methods for estimating and storing the full covariance matrix suffer from numerical instability or slow convergence rates (see~\cite{pmlr-v206-chee23a}). For convex functions and their relaxations, ~\citet{zhu2024high,carter2025statistical} present computationally efficient uncertainty estimation approaches that are related but different from ours.
   
In large-scale, high-dimensional problems, maintaining numerous bootstrap replicas is computationally expensive.~\cite{pmlr-v206-chee23a} introduce a scalable method for confidence intervals around SGD iterates, which are informative yet conservative under regularity conditions such as strong convexity at the optima. In their setting, for an appropriate initial learning rate, the covariance matrix can be approximated by a constant multiple of identity (see also~\cite{ljung1992plusminusref}). In our setting, such an approximation requires knowledge of all eigenvalues and eigenvectors of $\Sigma$. The work most relevant to ours is by~\cite{lunde2021bootstrapping}. They provide asymptotic distributions for the sin-squared error of the Oja vector and present an online multiplier bootstrap algorithm to estimate the underlying distribution.

\textbf{Resampling Methods and Bootstrapping.}
Nonparametric bootstrap~\citep{efron1979bootstrap,hall1992bootstrap,efron1993introduction} is a resampling method where $b$ resamples of a given size $n$ dataset are drawn with replacement and treated as $b$ independent samples drawn from the underlying distribution. Of these varieties of bootstraps, the one widely used in SGD inference is the online multiplier bootstrap, where multiple bootstrap resamples are updated in a streaming manner by sampling multiplier random variables to emulate the inherent uncertainty in the data~\citep{ramprasad2023online, zhong2023online, lunde2021bootstrapping}.%~\cite{blbjrssb} use s

A major concern about the bootstrap is its computational bottleneck. Maintaining many bootstrap replicates is computationally prohibitive if the number of data points $n$ and the dimension $d$ are large. Some computationally cheaper alternatives to bootstrap are subsampling~\citep{politis1999subsampling,Politis_10.1093/biomet/asad021, bertail1999subsampling, levina2017subsampling, chaudhuri2024differentially, chua2024scalable} and $m$-out-of-$n$ bootstrap~\citep{Bickel-m-out-of-n, bickel2008choice, sakov1998using, andrews2010asymptotic} both of which rely on drawing $o(n)$ with-replacement samples. These methods are used in~\cite{blbjrssb} to create $n$ with-replacement samples from smaller subsamples, but require multiple bootstrap replicates and are not directly applicable to the streaming setting.
%This has been one of the reasons why bootstrap has not gained a lot of traction in uncertainty quantification for modern big data settings. 

% \rd 
% \textbf{Our work.} We provide confidence intervals around estimates of $v_1(i) = e_i^{\top}v_1$ for all $i \in [d]$. Proposition~\ref{prop:main:clt} shows that the asymptotic marginal variance of $\voja(i)$ is governed by the variance of the leading term, $\Psi_{n,1}(i)$ (see Eq~\eqref{eq:hoeffding}). Since $v_1$ is unknown, $\roja$ (see Eq~\eqref{eq:per_coord_bound}) cannot be computed directly. We circumvent this issue by computing a proxy $\troja$ of $\roja$ using a high-accuracy estimate of $v_1$. A naive estimate of the variance of the $i^{\mathsf{th}}$ entry is to return $\troja(i)^2$. This does not concentrate strongly enough around the true variance to yield a uniform bound over all coordinates. We alleviate this by a median-of-means procedure. The key observation is that the elements of the (scaled) residual vector $(\eta_n\bb{\eigengap})^{-1/2} \roja$ in equation~\eqref{eq:per_coord_bound} behaves like a normal distribution with covariance matrix converging to a limiting covariance matrix $\V$.\bk