Sampling algorithms for l2 regression and applications

Petros Drineas, Michael W. Mahoney, S. Muthukrishnan

2006 (modified: 10 Nov 2023)SODA 2006Readers: Everyone

Abstract: We present and analyze a sampling algorithm for the basic linear-algebraic problem of l2 regression. The l2 regression (or least-squares fit) problem takes as input a matrix A ∈ Rn×d (where we assume n > d) and a target vector b ∈ Rn, and it returns as output Z = minx∈Rd |b - Ax|2. Also of interest is xopt = A+b, where A+ is the Moore-Penrose generalized inverse, which is the minimum-length vector achieving the minimum. Our algorithm randomly samples r rows from the matrix A and vector b to construct an induced l2 regression problem with many fewer rows, but with the same number of columns. A crucial feature of the algorithm is the nonuniform sampling probabilities. These probabilities depend in a sophisticated manner on the lengths, i.e., the Euclidean norms, of the rows of the left singular vectors of A and the manner in which b lies in the complement of the column space of A. Under appropriate assumptions, we show relative error approximations for both Z and xopt. Applications of this sampling methodology are briefly discussed.

0 Replies