Zero-Cost Operation Scoring in Differentiable Architecture Search

Lichuan Xiang; Łukasz Dudziak; Mohamed S Abdelfattah; Thomas Chun Pong Chau; Nicholas Donald Lane; Hongkai Wen

Zero-Cost Operation Scoring in Differentiable Architecture Search

Lichuan Xiang, Łukasz Dudziak, Mohamed S Abdelfattah, Thomas Chun Pong Chau, Nicholas Donald Lane, Hongkai Wen

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone

Abstract: Differentiable neural architecture search (NAS) has attracted significant attention in recent years due to its ability to quickly discover promising architectures of deep neural networks even in very large search spaces. Despite its success, many differentiable NAS methods lack robustness and may degenerate to trivial architectures with excessive parameter-free operations such as skip connections thus leading to inferior performance. In fact, selecting operations based on the magnitude of architectural parameters was recently proven to be fundamentally wrong, showcasing the need to rethink how operation scoring and selection occurs in differentiable NAS. To this end, we formalize and analyze a fundamental component of differentiable NAS: local "operation scoring" that occurs at each choice of operation. When comparing existing operation scoring functions, we find that existing methods can be viewed as inexact proxies for accuracy. We also find that existing methods perform poorly when analyzed empirically on NAS benchmarks. From this perspective, we introduce new training-free proxies to the context of differentiable NAS, and show that we can significantly speed up the search process while improving accuracy on multiple search spaces. We take inspiration from zero-cost proxies that were recently studied in the context of sample-based NAS but shown to degrade significantly for larger search spaces like DARTS. Our novel "perturbation-based zero-cost operation scoring" (Zero-Cost-PT) improves searching time and accuracy compared to the best available differentiable architecture search for many search space sizes, including very large ones. Specifically, we are able improve accuracy compared to the best current method (DARTS-PT) on the DARTS CNN search space while being over 40x faster (total searching time 25 minutes on a single GPU). Our code is available at: https://github.com/avail-upon-acceptance.

One-sentence Summary: A new perturbation-based NAS method for supernetworks using zero-cost proxies, achieving SOTA accuracy but >40x faster

17 Replies

Loading