- Keywords: batching, vectorization, compiler, stack, recursion, data-dependent, control flow
- TL;DR: Automatically recovering batching programs with recursion and data dependent control flow.
- Abstract: We present a general approach to batching arbitrary computations for accelerators such as GPUs. We show orders-of-magnitude speedups using our method on the NoU-Turn Sampler (NUTS), a workhorse algorithm in Bayesian statistics. The central challenge of batching NUTS and other Markov chain Monte Carlo algorithms is data-dependent control flow and recursion. We overcome this by mechanically transforming a single-example implementation into a form that explicitly tracks the current program point for each batch member, and only steps forward those in the same place. We present two different batching algorithms: a simpler, previously published one that inherits recursion from the host Python, and a more complex, novel one that implements recursion directly and can batch across it.We implement these batching methods as a general program transformation onPython source. Both the batching system and the NUTS implementation presented here are available as part of the popular TensorFlow Probability software package.