Abstract: We consider the problem of performing linear regression over a stream of d-dimensional examples, and show that any algorithm that uses a subquadratic amount of memory exhibits a slower rate of convergence than can be achieved without memory constraints. Specifically, consider a sequence of labeled examples (a1,b1), (a2,b2)…, with ai drawn independently from a d-dimensional isotropic Gaussian, and where bi = ⟨ ai, x⟩ + ηi, for a fixed x ∈ ℝd with ||x||2 = 1 and with independent noise ηi drawn uniformly from the interval [−2−d/5,2−d/5]. We show that any algorithm with at most d2/4 bits of memory requires at least Ω(d loglog1/є) samples to approximate x to ℓ2 error є with probability of success at least 2/3, for є sufficiently small as a function of d. In contrast, for such є, x can be recovered to error є with probability 1−o(1) with memory O(d2 log(1/є)) using d examples. This represents the first nontrivial lower bounds for regression with super-linear memory, and may open the door for strong memory/sample tradeoffs for continuous optimization.
0 Replies
Loading