On Weight-Sharing and Bilevel Optimization in Architecture Search

Mikhail Khodak; Liam Li; Maria-Florina Balcan; Ameet Talwalkar

On Weight-Sharing and Bilevel Optimization in Architecture Search

Mikhail Khodak, Liam Li, Maria-Florina Balcan, Ameet Talwalkar

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

TL;DR: An analysis of the learning and optimization structures of architecture search in neural networks and beyond.

Abstract: Weight-sharing—the simultaneous optimization of multiple neural networks using the same parameters—has emerged as a key component of state-of-the-art neural architecture search. However, its success is poorly understood and often found to be surprising. We argue that, rather than just being an optimization trick, the weight-sharing approach is induced by the relaxation of a structured hypothesis space, and introduces new algorithmic and theoretical challenges as well as applications beyond neural architecture search. Algorithmically, we show how the geometry of ERM for weight-sharing requires greater care when designing gradient- based minimization methods and apply tools from non-convex non-Euclidean optimization to give general-purpose algorithms that adapt to the underlying structure. We further analyze the learning-theoretic behavior of the bilevel optimization solved by practical weight-sharing methods. Next, using kernel configuration and NLP feature selection as case studies, we demonstrate how weight-sharing applies to the architecture search generalization of NAS and effectively optimizes the resulting bilevel objective. Finally, we use our optimization analysis to develop a simple exponentiated gradient method for NAS that aligns with the underlying optimization geometry and matches state-of-the-art approaches on CIFAR-10.

Keywords: neural architecture search, weight-sharing, bilevel optimization, non-convex optimization, hyperparameter optimization, model selection

Original Pdf: pdf

5 Replies

Loading