- Abstract: Neural architecture search (NAS) searches architectures automatically for given tasks, e.g., image classification and language modeling. Improving the search efficiency and effectiveness have attracted increasing attention in recent years. However, few efforts have been devoted to understanding the generated architectures and particularly the commonality these architectures may share. In this paper, we first summarize the common connection pattern of NAS architectures. We empirically and theoretically show that the common connection pattern contributes to a smooth loss landscape and more accurate gradient information, and therefore fast convergence. As a consequence, NAS algorithms tend to select architectures with such common connection pattern during architecture search. However, we show that the selected architectures with the common connection pattern may not necessarily lead to a better generalization performance compared with other candidate architectures in the same search space, and therefore further improvement is possible by revising existing NAS algorithms.
- Keywords: Neural Architecture Search, connection pattern, optimization, convergence, Lipschitz smoothness, gradient variance, generalization