Abstract: Neural Architecture Search (NAS), aiming at automati- cally designing network architectures by machines, is ex- pected to bring about a new revolution in machine learn- ing. Despite these high expectation, the effectiveness and efficiency of existing NAS solutions are unclear, with some recent works going so far as to suggest that many existing NAS solutions are no better than random architecture selec- tion. The ineffectiveness of NAS solutions may be attributed to inaccurate architecture evaluation. Specifically, to speed up NAS, recent works have proposed under-training differ- ent candidate architectures in a large search space concur- rently by using shared network parameters; however, this has resulted in incorrect architecture ratings and furthered the ineffectiveness of NAS.
In this work, we propose to modularize the large search space of NAS into blocks to ensure that the potential candi- date architectures are fully trained; this reduces the repre- sentation shift caused by the shared parameters and leads to the correct rating of the candidates. Thanks to the block- wise search, we can also evaluate all of the candidate ar- chitectures within each block. Moreover, we find that the knowledge of a network model lies not only in the network parameters but also in the network architecture. Therefore, we propose to distill the neural architecture (DNA) knowl- edge from a teacher model to supervise our block-wise ar- chitecture search, which significantly improves the effective- ness of NAS. Remarkably, the performance of our searched architectures has exceeded the teacher model, demonstrat- ing the practicability of our method. Finally, our method achieves a state-of-the-art 78.4% top-1 accuracy on Im- ageNet in a mobile setting. All of our searched models along with the evaluation code are available at https: //github.com/changlin31/DNA.
0 Replies
Loading