SiGeo: Sub-One-Shot NAS via Information Theory and Geometry of Loss Landscape

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: general machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Neural Architecture Search, Zero-shot NAS, Sub-one-shot NAS, Loss Landscape, Convergence Analysis, Generalization Capacity
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We introduce the "sub-one-shot" NAS paradigm, bridging zero-shot and one-shot methods, and present SiGeo, a theoretically grounded proxy outperforming competitors while achieving a 2-3 times improvement in search efficiency.
Abstract: Neural Architecture Search (NAS) has become a widely used tool for automating neural network design. While one-shot NAS methods have successfully reduced computational requirements, they often require extensive training. On the other hand, zero-shot NAS utilizes training-free proxies to evaluate a candidate architecture's test performance but has two limitations: (1) difficulty in taking advantage of strictly increasing information and (2) unreliable performance, particularly in complex domains like recommendation systems, due to the multi-modal data inputs and complex architecture configurations. To synthesize the benefits of both methods, we introduce a "sub-one-shot" paradigm that serves as a bridge between zero-shot and one-shot NAS. In sub-one-shot NAS, the supernet is trained using only a small subset of the training data, a phase we refer to as "warm-up." Within this framework, we present SiGeo, a proxy for NAS, founded on a novel theoretical framework that connects the supernet warm-up with the efficacy of the proxy. Extensive experiments have demonstrated that SiGeo, even without the benefit of supernet warm-up, consistently outperforms state-of-the-art zero-shot NAS competitors on various established NAS benchmarks. When warmed up, it can achieve comparable performance to one-shot NAS methods, but with a significant reduction ($\sim 60$\%) in computational costs.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5974
Loading