Abstract:Classical reverse-mode automatic differentiation (AD) imposes only a small
constant-factor overhead in operation count over the original computation, but
has storage requirements that grow, in the worst case, in proportion to the
time consumed by the original computation. This storage blowup can be
ameliorated by checkpointing, a process that reorders application of classical
reverse-mode AD over an execution interval to tradeoff space \vs\ time.
Application of checkpointing in a divide-and-conquer fashion to strategically
chosen nested execution intervals can break classical reverse-mode AD into
stages which can reduce the worst-case growth in storage from linear to
sublinear. Doing this has been fully automated only for computations of
particularly simple form, with checkpoints spanning execution intervals
resulting from a limited set of program constructs. Here we show how the
technique can be automated for arbitrary computations. The essential
innovation is to apply the technique at the level of the language
implementation itself, thus allowing checkpoints to span any execution
TL;DR:Compute gradients or program with O(log t) slowdown and O(log t) increase in space over running program that runs in O(t) time.