Keywords: bilevel optimization, stochastic gradient estimator, Neural Architecture Search, Differentiable NAS
TL;DR: SGE-NAS is proposed, a novel differentiable NAS combined with estimator and gradient descent, which replaces the previous two-step approximation algorithm and improves the search accuracy.
Abstract: Neural architecture search (NAS) has recently attracted more attention due to its ability to design deep neural networks automatically. Differentiable NAS methods have predominated due to their search efficiency. However, differentiable NAS methods consistently adopt approximate gradient-based methods to solve bilevel optimization problems. While second derivative approximation optimizes Jacobian or/and Hessian vector computation, it is imprecise and time-consuming in practice. In this paper, we revisit the hypergradient of bilevel optimization problems in NAS, then propose a new optimizer based on a stochastic gradient estimator(SGE) for the computation of the Jacobian matrix in the hypergradient. The SGE is adaptable to previous differentiable NAS methods and eliminates the second-order computation in the optimization process. In the experiments on commonly differentiable NAS benchmarks, the proposed SGE-NAS algorithm outperforms the baseline algorithm. The test result demonstrates that the proposed SGE-NAS can effectively reduce search time and find the model with higher classification performance.