Automatic Differentiation of Sketched Regression

Hang Liao, David Woodruff, Vamsi K. Potluru, Barak A. Pearlmutter

12 May 2023OpenReview Archive Direct UploadReaders: Everyone

Abstract: Sketching for speeding up regression problems involves using a sketching matrix S to quickly find the approximate solution to a linear least squares regression (LLS) problem: given A of size n × d, with n d, along with b of size n × 1, we seek a vector y with minimal regression error kAy − bk2. This approximation technique is now standard in data science, and many software systems use sketched regression internally, as a component. It is often useful to calculate derivatives (gradients for the purpose of optimization, for example) of such large systems, where sketched LLS is merely a component of a larger system whose derivatives are needed. To support Automatic Differentiation (AD) of systems containing sketched LLS, we consider propagating derivatives through LLS: both propagating perturbations (forward AD) and gradients (reverse AD). AD performs accurate differentiation and is efficient for problems with a huge number of independent variables. Since we use LLSS (sketched LLS) instead of LLS for reasons of efficiency, propagation of derivatives also needs to trade accuracy for efficiency, presumably by sketching. There are two approaches for this: (a) use AD to transform the code that defines LLSS, or (b) approximate exact derivative propagation through LLS using sketching methods. We provide strong bounds on the errors produced due to these two natural forms of sketching in the context of AD, giving the first dimensionality reduction analysis for calculating the derivatives of a sketched computation. Our results crucially depend on a novel analysis of the operator norm of a sketched inverse matrix product in this context. Extensive experiments on both synthetic and real-world experiments demonstrate the efficacy of our sketched gradients.

0 Replies