Neural Architecture Search via Ensemble-based Knowledge Distillation

Fanxin Li; Shixiong Zhao; Haowen Pi; Yuhao QING; Yichao Fu; Sen Wang; Heming Cui

Neural Architecture Search via Ensemble-based Knowledge Distillation

Fanxin Li, Shixiong Zhao, Haowen Pi, Yuhao QING, Yichao Fu, Sen Wang, Heming Cui

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone

Keywords: NAS, Knowledge Distillation, Imagenet

Abstract: Neural Architecture Search (NAS) automatically searches for well-performed network architectures from a given search space. The One-shot NAS method improves the training efficiency by sharing weights among the possible architectures in the search space, but unfortunately suffers from insufficient parameterization of each architecture due to interferences from other architectures. Recent works attempt to alleviate the insufficient parameterization problem by knowledge distillation, which let the learning of all architectures (students) be guided by the knowledge (i.e., parameters) from a better-parameterized network (teacher), which can be either a pre-trained one (e.g., ResNet50) or some searched out networks with good accuracy performance up to now. However, all these methods fall short in providing a sufficiently outstanding teacher, as they either depend on a pre-trained network that does not fit the NAS task the best, or the selected fitting teachers are still undertrained and inaccurate. In this paper, we take the first step to propose an ensemble-based knowledge distillation method for NAS, called EnNAS, which assembles an outstanding teacher by aggregating a set of architectures currently searched out with the most diversity (high diversity brings highly accurate ensembles); by doing so, EnNAS can deliver a high-quality knowledge distillation with outstanding teacher network (i.e., the ensemble network) all the time. Eventually, compared with existing works, on the real-world dataset ImageNet, EnNAS improved the top-1 accuracy of architectures searched out by 1.2% on average and 3.3% at most.

One-sentence Summary: We propose an diversity-driven ensemble method to generate teachers that boosts up knowledge distillation for training one-shot NAS.

4 Replies

Loading