DARTS without a Validation Set: Optimizing the Marginal Likelihood

Miroslav Fil; Binxin Ru; Clare Lyle; Yarin Gal

DARTS without a Validation Set: Optimizing the Marginal Likelihood

Miroslav Fil, Binxin Ru, Clare Lyle, Yarin Gal

Published: 10 Dec 2021, Last Modified: 05 May 2023NeurIPS 2021 Workshop MetaLearn PosterReaders: Everyone

Keywords: NAS, Bayesian deep learning, DARTS, bi-level optimization, marginal likelihood, AutoML

TL;DR: Searching architecture in DARTS directly on the training set via marginal likelihood instead of using a validation set leads to fundamentally changed behavior while improving performance.

Abstract: The success of neural architecture search (NAS) has historically been limited by excessive compute requirements. While modern weight-sharing NAS methods such as DARTS are able to finish the search in single-digit GPU days, extracting the final best architecture from the shared weights is notoriously unreliable. Training-Speed-Estimate (TSE), a recently developed generalization estimator with a Bayesian marginal likelihood interpretation, has previously been used in place of the validation loss for gradient-based optimization in DARTS. This prevents the DARTS skip connection collapse, which significantly improves performance on NASBench-201 and the original DARTS search space. We extend those results by applying various DARTS diagnostics and show several unusual behaviors arising from not using a validation set. Furthermore, our experiments yield concrete examples of the depth gap and topology selection in DARTS having a strongly negative impact on the search performance despite generally receiving limited attention in the literature compared to the operations selection.

Contribution Process Agreement: Yes

Author Revision Details: Improved clarity of writing and formatting of references

Poster Session Selection: Poster session #3 (16:50 UTC)

0 Replies

Loading