Keywords: Regression, Streaming, Online algorithms, Flows
TL;DR: We give new streaming algorithms for underconstrained regression when we see the columns one at a time, and obtain related results for flows.
Abstract: We initiate the study of one-pass streaming algorithms for underdetermined $\ell_p$ linear regression problems of the form
\min_{\mathbf A\mathbf x = \mathbf b} \lVert\mathbf x\rVert_p \,, \qquad
\text{where } \mathbf A \in \mathbb R^{n \times d} \text{ with } n \ll d \,,
which generalizes basis pursuit ($p = 1$) and least squares solutions to
underdetermined linear systems ($p = 2$). We study the column-arrival
streaming model, in which the columns of $\mathbf A$ are presented one by one in a
stream. When $\mathbf A$ is the incidence matrix of a graph, this corresponds to an
edge insertion graph stream, and the regression problem captures $\ell_p$
flows which includes transshipment ($p = 1$), electrical flows ($p = 2$), and
max flow ($p = \infty$) on undirected graphs as special cases. Our goal is to
design algorithms which use space much less than the entire stream, which has
a length of $d$.
For the task of estimating the cost of the $\ell_p$ regression problem for
$p\in[2,\infty]$, we show a streaming algorithm which constructs a sparse
instance supported on $\tilde O(\varepsilon^{-2}n)$ columns of $\mathbf A$
which approximates the cost up to a $(1\pm\varepsilon)$ factor, which
corresponds to $\tilde O(\varepsilon^{-2}n^2)$ bits of space in general and
an $\tilde O(\varepsilon^{-2}n)$ space semi-streaming algorithm for
constructing $\ell_p$ flow sparsifiers on graphs. This extends to $p\in(1,
2)$ with $\tilde O(\varepsilon^{2}n^{q/2})$ columns, where $q$ is the H\"older
conjugate exponent of $p$. For $p = 2$, we show that $\Omega(n^2)$ bits of
space are required in general even for outputting a constant factor
solution. For $p = 1$, we show that the cost cannot be estimated even to an
$o(\sqrt n)$ factor in $\mathrm{poly}(n)$ space.
On the other hand, if we are interested in outputting a solution $\mathbf
x$, then we show that $(1+\varepsilon)$-approximations require $\Omega(d)$
space for $p > 1$, and in general, $\kappa$-approximations require
$\tilde\Omega(d/\kappa^{2q})$ space for $p > 1$. We complement these lower
bounds with the first sublinear space upper bounds for this problem, showing
that we can output a $\kappa$-approximation using space only
$\mathrm{poly}(n) \cdot \tilde O(d/\kappa^q)$ for $p > 1$, as well as a
$\sqrt n$-approximation using $\mathrm{poly}(n, \log d)$ space for $p = 1$.
Supplementary Material: pdf
Primary Area: optimization
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7681