['6,31c6,28', '< We consider the problem of approximating the top eigenvector in the streaming setting. In this problem, we are given vectors a 1 , . . . , a n ∈ R d one at a time in a stream. Let A be an n × d matrix with rows a 1 , . . . , a n . The task is to approximate the top eigenvector of the matrix A T A. Throughout the paper, we use v 1 ∈ R d to denote the top eigenvector of A T A. We focus on obtaining streaming algorithms that use a small amount of space and can output a unit vector v such that ⟨v, v 1 ⟩ 2 ≥ 1f (R), where f (R) is a decreasing function in the gap R = λ 1 (A T A)/λ 2 (A T A). Here λ 1 (•), λ 2 (•) denote the two largest eigenvalues. As the gap R becomes larger, the eigenvector approximation problem becomes easier and we want more accurate approximations to the eigenvector v 1 .', '< If one is allowed to use Õ(d 2 ) 2 bits of space, we can maintain the matrix A T A = i a i a T i as we see the rows a i in the stream, and at the end of processing the stream, we can compute the exact top eigenvector v 1 . When the dimension d is large, the requirement of Ω(d 2 ) bits of memory can be impractical (see e.g., applications that require a large value of d in Mitliagkas et al. (2013).) Hence, an interesting question is to study non-trivial streaming algorithms that use less memory. In this work, we focus on obtaining algorithms that use Õ(d) bits of space.', '< In the offline setting (where the entire matrix A is available to us), fast iterative algorithms such as Gu (2015); Musco and Musco (2015); Musco et al. (2018) can be used to quickly obtain accurate approximations to the top eigenvector when the gap R = Ω(1). In a single pass streaming setting, we cannot run these algorithms as these iterative algorithms need to see the entire matrix multiple times.', "< There have been two major lines of work studying the problem of eigenvector approximation and the related Principal Component Analysis (PCA) problem in the streaming setting with near-linear in d memory. In the first line of work, each row encountered in the stream is assumed to be sampled independently from an unknown distribution with mean 0 and covariance Σ and the task is to approximate the top eigenvector of Σ using the samples. In this line of work, the sample complexity required for algorithms using O(d • polylog(d)) bits of space to output an approximation to v 1 , is the main question. The algorithms are usually a variant of Oja's algorithm (Oja, 1982;Jain et al., 2016; Allen-Zhu and Li, 2017; Huang et al., 2021; Kumar and Sarkar, 2023) or the block power method (Hardt and Price, 2014;Balcan et al., 2016). We note that Kumar and Sarkar (2023) relax the i.i.d. assumption and analyze the sample complexity of Oja's algorithm for estimating the top eigenvector in the Markovian data setting.", '< The other line of work studies algorithms for arbitrary streams appearing in an arbitrary order. In this setting, we want algorithms to work for any input stream given in any order. A problem closely related to the eigenvector estimation problem is the Frobenius-norm Low Rank Approximation (Clarkson and Woodruff, 2017; Boutsidis et al., 2016;Upadhyay, 2016;Ghashami et al., 2016). The deterministic Frequent Directions sketch of Ghashami et al. (2016) can, using Õ(d/ε) bits of space, output a unit vector u such that', '< ∥A(I -uu T )∥ 2 F ≤ (1 + ε)∥A(I -v 1 v T 1 )∥ 2 F .', "< Although the vector u is a 1 + ε approximate solution to the Frobenius norm Low Rank Approximation problem, it is possible that the vector u may be (nearly) orthogonal to the top eigenvector v 1 . Hence the Frequent Directions sketch does not guarantee top eigenvector approximation. Recently, Price and Xun (2024) study the eigenvector approximation problem in arbitrary streams and obtain results in terms of the gap R of the instance. Price and Xun prove that when R = Ω(log n • log d), a variant of Oja's algorithm outputs a unit vector v such that", '< ⟨v, v 1 ⟩ 2 ≥ 1 - C log d R - 1 poly(d)', '< where C is a large enough universal constant. On the lower bound side, Price and Xun showed that any algorithm that outputs a vector v satisfying ⟨v, v 1 ⟩ 2 ≥ 1 -1 CR 2 , must use Ω(d 2 /R3 ) bits of space while processing the stream. This lower bound shows that in the important case of R = O(1), the correlation 3 that can be obtained by an algorithm using Õ(d) bits of space is at most a constant less than 1. Thus, the current best algorithms for arbitrary streams work only when R = Ω(log n • log d) and for the important case of R = O(1), there are no existing algorithms requiring significantly fewer than d 2 bits of memory. They also give a lower bound on the size of mergeable summaries for approximating the top eigenvector.', "< We identify an instance with R = Θ(log d/ log log d) where the algorithm of Price and Xun fails to produce a vector with even a constant correlation with the vector v 1 . This shows that their algorithm or other variants of Oja's algorithm may fail to extend to the case when R = O(1). We further show that the algorithm of Price and Xun fails to produce such a vector even when the rows in our hard instance are ordered uniformly at random, showing that even randomly ordered streams can be hard to solve for variants of Oja's algorithm.", '< In this work, we focus on algorithms that work on worst case inputs A while assuming that the rows of A are uniformly randomly ordered. This model is mid-way between the i.i.d. setting and the arbitrary order stream setting in terms of the generality of streams that can be modeled. We note that a number of works (Munro and Paterson, 1980;Guha et al., 2005;Chakrabarti et al., 2008;Guha and McGregor, 2009;Assadi and Sundaresan, 2023) have previously considered streaming algorithms and lower bounds for worst case inputs with random order streams, as it is a natural model often arising in practical settings. Our algorithms are parameterized in terms of the number of heavy rows in the stream. See Gupta and Singla (2021) for a gentle introduction to the random-order model. We define a row a i to be heavy if ∥a i ∥ 2 ≥ ∥A∥ F / d • polylog(d). Note that in any stream of rows, by definition, there are at most d • polylog(d) heavy rows. We state our theorem informally below: Theorem 1.1. Let a 1 , . . . , a n ∈ R d be a randomly ordered stream and let A denote the n × d matrix with rows given by a 1 , . . . , a n . If R = λ 1 (A T A)/λ 2 (A T A) > C for a large enough constant C and the number of heavy rows in the stream is at most h, then there is a streaming algorithm using O(h • d • polylog(d)) bits of space and outputting a unit vector v satisfying', '< ⟨v, v 1 ⟩ 2 ≥ 1 -O(1/ √ R)', '< with a probability ≥ 4/5.', '< Our algorithm is a variant of the block power method. Along the way, we also improve the gap requirements in the results of Price and Xun (2024). We show that by subsampling a stream of rows, the algorithm of Price and Xun can be made to work even when the gap R is Ω(log 2 d) in arbitrary order streams, improving on the Ω(log n • log d) requirement in their analysis. We also show that in random order streams, a gap of Ω(log d) is sufficient for their algorithm, though our algorithm improves on this and works for even a constant gap.', '< Similar to the lower bound of Price and Xun, we show that any algorithm for random order streams must use Ω(h • d/R) bits of space to output a vector v satisfying ⟨v, v 1 ⟩ 2 ≥ 1 -1/CR 2 where C is a constant. We summarize the theorem below. Theorem 1.2. Consider an arbitrary random order stream a 1 , . . . , a n with the gap parameter', '< σ1(A) 2 σ2(A) 2 = R.', '< Let h be the number of heavy rows in the stream. Any streaming algorithm that outputs a unit vector v such that', '< ⟨v, v 1 ⟩ 2 ≥ 1 -1/CR 2', '< for a large enough constant C, with a probability ≥ 1 -(1/2) R+1 over the ordering of the stream and its internal randomness, must use Ω(h • d/R) bits of space.', '< Techniques. The randomized power method (Gu, 2015) algorithm to approximate the top eigenvector samples a random Gaussian vector g and iteratively computes the vector v = (A T A) t g4 for t = Θ(log d) iterations and shows that when the gap R is large, v/∥v∥ 2 is a good approximation for v 1 . Thus, the algorithm needs to see the quadratic form A T A multiple times and hence, it cannot be implemented in the single-pass streaming setting of this paper.', '< Assume that the stream is randomly ordered and that there are no heavy rows. Our key observation is that if the stream is long enough, then we can see t approximations B T j B j5 of the quadratic form A T A. Here the matrices B 1 , . . . , B t are formed by sampling and rescaling the rows of the matrix A and importantly, the rows of B 1 , . . . , B t do not overlap in the stream, that is, they appear one after the other. Thus we can compute', '< v ′ = (B T t B t ) • • • (B T 1 B 1 )', '< • g for the starting vector g in a single pass over the stream. We prove that such matrices B j exist using the row norm sampling result of Magdon-Ismail (2010). Now, the main issue is to show that v ′ /∥v ′ ∥ 2 is a good approximation to the top eigenvector v 1 . We crucially use a singular value inequality of Wang and Xi (1997) to prove that ∥B T j B j -A T A∥ 2 ≤ ε∥A∥ 2 2 for all j suffices for v ′ /∥v ′ ∥ 2 to be a good approximation to v 1 . The above analysis assumes that there are no heavy rows. Indeed, suppose that a matrix A has a row a with a large Euclidean norm which is orthogonal to all the other rows. Also assume that the top eigenvector of the matrix A is in this direction. Since, the matrices B 1 , . . . , B t are non-overlapping substreams of the matrix A, at most one of the matrices B j can have the row a and hence the vector v ′ /∥v ′ ∥ 2 will not be a good approximation to a/∥a∥ 2 , the top eigenvector. Thus, we need to handle the heavy rows separately. We show that, by storing all the rows with a Euclidean norm larger than ∥A∥ F / d • polylog(d) and running the above described algorithm on the remaining set of rows, we can obtain a good approximation to the top eigenvector.', '< Our lower bound (Theorem 1.2) shows that any single-pass streaming algorithm must use space proportional to the number of heavy rows, and therefore our procedure that handles the heavy rows separately gives near-optimal bounds. Finally, the row norm sampling technique of Magdon-Ismail (2010) serves as a general technique to reduce the number of rows in the stream while (approximately) preserving the top eigenvector. We use this observation to improve the R = Ω(log n • log d) for arbitrary streams in Price and Xun (2024) to R = Ω(log 2 d). We then show that assuming a uniformly random order, the analysis of Price and Xun (2024) can be improved to show that R = Ω(log d) suffices. Thus, for random order streams, techniques before our work can be used to approximate the top eigenvector when the gap R = Ω(log d). Our work improves upon this to give an algorithm that works for streams with R = Ω(1).', "< Implications to practice. Often, in practical situations, we can assume that the rows being streamed are sampled independently from a nice-enough distribution, in which case Oja's algorithm, as discussed, can approximate the top eigenvector accurately given enough samples. However, independence and assumptions on the covariance matrix can be very strong assumptions in some cases and in such cases, our algorithm only requires that the order of the rows in the stream be uniformly random, in which case we output an approximation with provable guarantees.", '< Organization. We first introduce the row-norm sampling procedure to obtain approximate quadratic forms. The proof is a slight modification of that of Magdon-Ismail (2010). The only difference is that we instead consider a version that samples each row in the input independently with some appropriate probability and keeps the rows that are sampled after scaling appropriately. We then introduce and analyze our block power iteration algorithm when all rows have roughly the same Euclidean norm, and then extend it to the general case, which is our main result. Finally, we provide a lower bound showing that Ω(td/R) bits of space is necessary to obtain constant correlation with the top eigenvector. Due to space constraints, all of our proofs are placed in the appendix.', '---', '> The problem of approximating the top eigenvector in a streaming setting is fundamental in various applications, from principal component analysis to network analysis. Here, we are given a sequence of vectors a₁, ..., aₙ ∈ ℝᵈ one at a time, forming an n×d matrix A. Our objective is to efficiently approximate the top eigenvector v₁ of the matrix AᵀA (equivalently, the top right singular vector of A). We seek streaming algorithms that use minimal space and output a unit vector v such that ⟨v, v₁⟩² ≥ f(R), where f(R) is a decreasing function of the spectral gap R = λ₁(AᵀA)/λ₂(AᵀA). Here, λ₁(⋅) and λ₂(⋅) denote the two largest eigenvalues. Intuitively, a larger spectral gap R simplifies the approximation task, allowing for more accurate estimations of v₁.', '> ', '> Standard approaches that maintain the full matrix AᵀA require Õ(d²) bits of space, which becomes impractical for high-dimensional data (e.g., as noted by Mitliagkas et al. (2013)). This motivates the study of non-trivial streaming algorithms operating within significantly reduced memory footprints, specifically aiming for Õ(d) bits of space.', '> ', '> In an offline setting, where the entire matrix A is accessible, fast iterative algorithms (e.g., Gu (2015); Musco and Musco (2015); Musco et al. (2018)) can rapidly yield accurate top eigenvector approximations when R = Ω(1). However, these multi-pass iterative methods are unsuitable for a single-pass streaming environment, where data is observed only once.', '> ', "> Research into streaming eigenvector approximation and Principal Component Analysis (PCA) with near-linear space complexity has largely followed two distinct paths. One line of work assumes that each streamed row is independently sampled from an unknown distribution (typically with zero mean and covariance Σ), and the goal is to approximate the top eigenvector of Σ. Algorithms in this category, often variants of Oja's algorithm (Oja, 1982; Jain et al., 2016; Allen-Zhu and Li, 2017; Huang et al., 2021; Kumar and Sarkar, 2023) or the block power method (Hardt and Price, 2014; Balcan et al., 2016), primarily focus on the sample complexity required to output an O(d ⋅ polylog(d)) space approximation to v₁. Notably, Kumar and Sarkar (2023) extend Oja's algorithm analysis to Markovian data, relaxing the i.i.d. assumption.", '> ', "> The second major research direction considers arbitrary input streams, where rows can appear in any order. A related problem is Frobenius-norm Low Rank Approximation (Clarkson and Woodruff, 2017; Boutsidis et al., 2016; Upadhyay, 2016; Ghashami et al., 2016). While deterministic sketches like Frequent Directions (Ghashami et al., 2016) can produce a vector u with Õ(d/ε) space such that ∥A(I - uuᵀ)∥_F² ≤ (1 + ε)∥A(I - v₁v₁ᵀ)∥_F², this does not guarantee a strong correlation between u and the top eigenvector v₁. Price and Xun (2024) recently investigated eigenvector approximation in arbitrary streams, providing results dependent on the spectral gap R. They showed that for R = Ω(log n ⋅ log d), a variant of Oja's algorithm produces a unit vector v with ⟨v, v₁⟩² ≥ 1 - C log d / (R - 1) ⋅ poly(d). Their lower bound indicates that achieving ⟨v, v₁⟩² ≥ 1 - 1/(CR²) requires Ω(d²/R³) bits of space, implying that for the critical case of R = O(1), existing Õ(d) space algorithms for arbitrary streams yield at most constant correlation less than 1. They also established lower bounds for mergeable summaries in this context.", '> ', "> We demonstrate a specific instance where R = Θ(log d / log log d) and the algorithm of Price and Xun fails to achieve even constant correlation with v₁. This highlights limitations of Oja's algorithm variants for smaller gaps, even when rows are uniformly randomly ordered.", '> ', '> Our work focuses on algorithms for worst-case input matrices A, under the crucial assumption that the rows are presented in a uniformly random order. This model strikes a balance between the restrictive i.i.d. setting and the highly general arbitrary-order stream setting, and has practical relevance (Munro and Paterson, 1980; Guha et al., 2005; Chakrabarti et al., 2008; Guha and McGregor, 2009; Assadi and Sundaresan, 2023; Gupta and Singla, 2021). Our algorithms are parameterized by the number of "heavy" rows, defined as those with Euclidean norm ∥aᵢ∥₂ ≥ ∥A∥_F / (d ⋅ polylog(d)). By definition, any stream contains at most d ⋅ polylog(d) heavy rows. We informally state our main contributions:', '> ', '> **Theorem 1.1 (Informal).** For a randomly ordered stream a₁, ..., aₙ ∈ ℝᵈ forming matrix A, if the spectral gap R = λ₁(AᵀA)/λ₂(AᵀA) > C for a sufficiently large constant C, and there are at most h heavy rows, then a streaming algorithm exists that uses O(h ⋅ d ⋅ polylog(d)) bits of space and outputs a unit vector v satisfying ⟨v, v₁⟩² ≥ 1 - O(1/√R) with probability ≥ 4/5.', '> ', "> Our algorithm is a novel variant of the block power method. Furthermore, we improve the gap requirements for Price and Xun's (2024) results. We show that with subsampling, their algorithm can achieve R = Ω(log² d) for arbitrary order streams (improving from Ω(log n ⋅ log d)). For random order streams, we demonstrate that R = Ω(log d) suffices for their algorithm, though our proposed algorithm achieves this for even a constant gap R = Ω(1), representing a significant improvement.", '> ', '> **Theorem 1.2 (Informal).** For an arbitrary random order stream a₁, ..., aₙ with spectral gap R = σ₁(A)²/σ₂(A)² and h heavy rows, any streaming algorithm that outputs a unit vector v such that ⟨v, v₁⟩² ≥ 1 - 1/(CR²) for a large enough constant C, with probability ≥ 1 - (1/2)ᴿ⁺¹ (over stream ordering and internal randomness), must use Ω(h ⋅ d/R) bits of space.', '> ', "> This lower bound establishes the near-optimality of our algorithm's space complexity, particularly its dependence on the number of heavy rows. The row norm sampling technique by Magdon-Ismail (2010) is a general tool for reducing stream size while preserving the top eigenvector, which we leverage to achieve improved gap requirements for Price and Xun's algorithm.", '> ', '> **Techniques.** The randomized power method (Gu, 2015) uses a random Gaussian vector g and iteratively computes v = (AᵀA)ᵗg for t = Θ(log d) iterations. For large R, v/∥v∥₂ approximates v₁. However, this requires multiple passes over AᵀA, making it unsuitable for single-pass streaming. Our key insight for random order streams without heavy rows is that if the stream is sufficiently long, we can construct t non-overlapping approximations BⱼᵀBⱼ of AᵀA, appearing sequentially in the stream. This allows us to compute v\' = (BₜᵀBₜ) ⋅ ... ⋅ (B₁ᵀB₁)g in a single pass. We prove the existence of such Bⱼ using row norm sampling (Magdon-Ismail, 2010) and show that v\'/∥v\'∥₂ approximates v₁ by crucially employing a singular value inequality from Wang and Xi (1997) to bound ∥BⱼᵀBⱼ - AᵀA∥₂ ≤ ε∥A∥₂². Heavy rows pose a challenge, as a single heavy row orthogonal to others could dominate the top eigenvector but only appear in one Bⱼ. We address this by storing heavy rows separately (O(h ⋅ d ⋅ polylog(d)) space) and applying our algorithm to the remaining "light" rows. Our lower bound (Theorem 1.2) confirms that space proportional to heavy rows is necessary, validating our approach.', '32a30,33', "> **Implications to practice.** While Oja's algorithm is effective under i.i.d. sampling assumptions, these can be overly strong. Our algorithm offers provable guarantees under the weaker, yet practically relevant, assumption of uniformly random row order, making it applicable in a broader range of real-world scenarios.", '> ', '> **Organization.** We begin by detailing the row-norm sampling procedure, a modification of Magdon-Ismail (2010) that samples and rescales rows independently. We then introduce and analyze our block power iteration algorithm, first for streams with bounded row norms, then extending it to the general case, which constitutes our main theoretical contribution. Finally, we establish a lower bound demonstrating that Ω(td/R) bits of space are necessary for constant correlation with the top eigenvector. All proofs are deferred to the appendix due to space constraints.', '> ', '553d553', '< ']
