Abstract: Sketching for speeding up regression problems involves using a sketching matrix S to
quickly find the approximate solution to a
linear least squares regression (LLS) problem: given A of size n × d, with n d,
along with b of size n × 1, we seek a vector
y with minimal regression error kAy − bk2.
This approximation technique is now standard in data science, and many software systems use sketched regression internally, as a
component. It is often useful to calculate
derivatives (gradients for the purpose of optimization, for example) of such large systems, where sketched LLS is merely a component of a larger system whose derivatives
are needed. To support Automatic Differentiation (AD) of systems containing sketched
LLS, we consider propagating derivatives
through LLS: both propagating perturbations (forward AD) and gradients (reverse
AD). AD performs accurate differentiation
and is efficient for problems with a huge number of independent variables. Since we use
LLSS (sketched LLS) instead of LLS for reasons of efficiency, propagation of derivatives
also needs to trade accuracy for efficiency,
presumably by sketching. There are two approaches for this: (a) use AD to transform
the code that defines LLSS, or (b) approximate exact derivative propagation through
LLS using sketching methods. We provide
strong bounds on the errors produced due to
these two natural forms of sketching in the
context of AD, giving the first dimensionality reduction analysis for calculating the
derivatives of a sketched computation. Our
results crucially depend on a novel analysis of
the operator norm of a sketched inverse matrix product in this context. Extensive experiments on both synthetic and real-world
experiments demonstrate the efficacy of our
sketched gradients.
0 Replies
Loading