Betty: An Automatic Differentiation Library for Multilevel OptimizationDownload PDF

04 Oct 2022, 23:09 (modified: 26 Nov 2022, 09:49)NeurIPS 2022 Workshop MetaLearn PosterReaders: Everyone
Keywords: Multilevel Optimization, Automatic Differentiation, Bilevel Optimization, Meta Learning, Software Library
TL;DR: We develop a scalable, easy-to-use, and modular automatic differentiation library for multilevel optimization based on the novel interpretation of multilevel optimization as a dataflow graph.
Abstract: Gradient-based multilevel optimization (MLO) has gained attention as a framework for studying numerous problems, ranging from hyperparameter optimization and meta-learning to neural architecture search and reinforcement learning. However, gradients in MLO, which are obtained by composing best-response Jacobians via the chain rule, are notoriously difficult to implement and memory/compute intensive. We take an initial step towards closing this gap by introducing Betty, a software library for large-scale MLO. At its core, we devise a novel dataflow graph for MLO, which allows us to (1) develop efficient automatic differentiation for MLO that reduces the computational complexity from $\mathcal{O}(d^3)$ to $\mathcal{O}(d^2)$, (2) incorporate systems support such as mixed-precision and data-parallel training for scalability, and (3) facilitate implementation of MLO programs of arbitrary complexity while allowing a modular interface for diverse algorithmic and systems design choices. We empirically demonstrate that Betty can be used to implement an array of MLO programs, while also observing up to 11% increase in test accuracy, 14% decrease in GPU memory usage, and 20% decrease in training wall time over existing implementations on multiple benchmarks. We also showcase that Betty enables scaling MLO to models with hundreds of millions of parameters.
0 Replies