Defying Multi-model Forgetting: Orthogonal Gradient Learning to One-shot Neural Architecture Search

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: general machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: One-shot neural architecture search, Multimodel forgetting, Orthogonal gradient learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: One-shot neural architecture search (NAS) trains an over-parameterized network (termed as supernet) that assembles all the architectures as its subnets by using weight sharing, and thereby reduces much computational budget. However, there is an issue of multi-model forgetting about supernet training in one-shot NAS that some weights of the previously well-trained architecture will be overwritten by that of the newly sampled architecture which has overlapped structures with the old one. To overcome the issue, we propose an orthogonal gradient learning (OGL) guided supernet training paradigm for one-shot NAS, where the novelty lies in the fact that the weights of the overlapped structures of current architecture are updated in the orthogonal direction to the gradient space of these overlapped structures of all previously trained architectures. Moreover, a new approach of calculating the projection is designed to effectively find the base vectors of the gradient space to acquire the orthogonal direction. We have theoretically and experimentally proved the effectiveness of the proposed paradigm in overcoming the multi-model forgetting. Besides, we apply the proposed paradigm to two one-shot NAS baselines, and experimental results have demonstrated that our approach is able to mitigate the multi-model forgetting and enhance the predictive ability of the supernet in one-shot NAS with remarkable efficiency on popular test datasets.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: pdf
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6907
Loading