Fast MNAS: Uncertainty-aware Neural Architecture Search with Lifelong LearningDownload PDF

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone
Keywords: Neural Architecture Search, AutoML, Reinforcement Learning (RL)
Abstract: Sampling-based neural architecture search (NAS) always guarantees better convergence yet suffers from huge computational resources compared with gradient-based approaches, due to the rollout bottleneck -- exhaustive training for each sampled generation on proxy tasks. This work provides a general pipeline to accelerate the convergence of the rollout process as well as the RL learning process in sampling-based NAS. It is motivated by the interesting observation that both the architecture and the parameter knowledge can be transferred between different experiments and even different tasks. We first introduce an uncertainty-aware critic (value function) in PPO to utilize the architecture knowledge in previous experiments, which stabilizes the training process and reduces the searching time by 4 times. Further, a life-long knowledge pool together with a block similarity function is proposed to utilize the lifelong parameter knowledge and reduces the searching time by 2 times. It is the first to introduce block-level weight sharing in RL-based NAS. The block similarity function guarantees a 100% hitting ratio with strict fairness. Besides, we show a simply designed off-policy correction factor that enables 'replay buffer' in RL optimization and further reduces half of the searching time. Experiments on the MNAS search space show the proposed FNAS accelerates standard RL-based NAS process by $\sim$10x (e.g. $\sim$256 2x2 TPUv2*days / 20,000 GPU*hour $\rightarrow$ 2,000 GPU*hour for MNAS), and guarantees better performance on various vision tasks.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
One-sentence Summary: We proposed FNAS which accelerates standard RL based NAS process by $\sim$10x and guarantees better performance on various vision tasks.
Reviewed Version (pdf): https://openreview.net/references/pdf?id=Owuhhc0XcY
9 Replies

Loading