Abstract: The ℓp regression problem takes as input a matrix A ∈ ℝn, a vector b ∈ ℝn, and a number p ∈ [1, ∞), and it returns as output a number Z and a vector xOPT ∈ ℝd such that Z = minx∈ℝd ||Ax - b||p = ||AxOPT - b||p. In this paper, we construct coresets and obtain an efficient two-stage sampling-based approximation algorithm for the very overconstrained (n > d) version of this classical problem, for all p ∈ [1, ∞). The first stage of our algorithm non-uniformly samples ř1 = O(36pdmax{p/2+1, p}+1) rows of A and the corresponding elements of b, and then it solves the lp regression problem on the sample; we prove this is an 8-approximation. The second stage of our algorithm uses the output of the first stage to resample ř1/ε2 constraints, and then it solves the lp regression problem on the new sample; we prove this is a (1 + ε)-approximation. Our algorithm unifies, improves upon, and extends the existing algorithms for special cases of ℓp regression, namely p = 1,2 [10, 13]. In course of proving our result, we develop two concepts--well-conditioned bases and subspace-preserving sampling--that are of independent interest.
External IDs:dblp:conf/soda/DasguptaDHKM08
Loading