Geometry-Aware Gradient Algorithms for Neural Architecture Search

Liam Li; Mikhail Khodak; Nina Balcan; Ameet Talwalkar

Geometry-Aware Gradient Algorithms for Neural Architecture Search

Liam Li, Mikhail Khodak, Nina Balcan, Ameet Talwalkar

Published: 12 Jan 2021, Last Modified: 22 Jun 2025ICLR 2021 SpotlightReaders: Everyone

Keywords: neural architecture search, automated machine learning, weight-sharing, optimization

Abstract: Recent state-of-the-art methods for neural architecture search (NAS) exploit gradient-based optimization by relaxing the problem into continuous optimization over architectures and shared-weights, a noisy process that remains poorly understood. We argue for the study of single-level empirical risk minimization to understand NAS with weight-sharing, reducing the design of NAS methods to devising optimizers and regularizers that can quickly obtain high-quality solutions to this problem. Invoking the theory of mirror descent, we present a geometry-aware framework that exploits the underlying structure of this optimization to return sparse architectural parameters, leading to simple yet novel algorithms that enjoy fast convergence guarantees and achieve state-of-the-art accuracy on the latest NAS benchmarks in computer vision. Notably, we exceed the best published results for both CIFAR and ImageNet on both the DARTS search space and NAS-Bench-201; on the latter we achieve near-oracle-optimal performance on CIFAR-10 and CIFAR-100. Together, our theory and experiments demonstrate a principled way to co-design optimizers and continuous relaxations of discrete NAS search spaces.

One-sentence Summary: Studying the right single-level optimization geometry yields state-of-the-art methods for NAS.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Code: [![github](/images/github_icon.svg) liamcli/gaea_release](https://github.com/liamcli/gaea_release)

Data: [CIFAR-10](https://paperswithcode.com/dataset/cifar-10), [CIFAR-100](https://paperswithcode.com/dataset/cifar-100), [ImageNet](https://paperswithcode.com/dataset/imagenet), [NAS-Bench-1Shot1](https://paperswithcode.com/dataset/nas-bench-1shot1), [NAS-Bench-201](https://paperswithcode.com/dataset/nas-bench-201)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/geometry-aware-gradient-algorithms-for-neural/code)

12 Replies

Loading