Efficient Bilevel Optimization with KFAC-Based Hypergradients
TL;DR: We scale bilevel optimization by using KFAC for curvature-aware hypergradients, outperforming unrolling and Neumann/CG, and showing curvature is valuable at scale with modest overhead.
Abstract: Bilevel optimization (BO) is widely applicable to many machine learning problems. However, to scale BO, practitioners often adopt crude approximations like one-step gradient unrolling or identity/short-Neumann surrogates, which discard curvature information. We build on implicit function theorem-based algorithms and propose to incorporate Kronecker-factored approximate curvature (KFAC), yielding curvature-aware hypergradients with a better performance–efficiency trade-off than CG/Neumann methods and consistently outperforming unrolling. We evaluate our method across diverse tasks, including meta-learning and AI safety related problems. On models up to BERT, we show that curvature information is valuable at scale, and KFAC can provide it with only modest memory and runtime overhead.
Submission Number: 1745
Loading