An Asymptotic Theory of Random Search for Hyperparameters in Deep Learning

ICLR 2025 Conference Submission13008 Authors

28 Sept 2024 (modified: 27 Nov 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: hyperparameters, hyperparameter search, hyperparameter tuning, random search, evaluation
TL;DR: We develop an asymptotic theory of random search that yields new tools for deep learning research.
Abstract: Scale is essential in modern deep learning; however, greater scale brings a greater need to make experiments efficient. Often, most of the effort is spent finding good hyperparameters, so we should consider exactly how much to spend searching for them—unfortunately this requires a better understanding of hyperparameter search, and how it converges, than we currently have. An emerging approach to such questions is *the tuning curve*, or the test score as a function of tuning effort. In theory, the tuning curve predicts how the score will increase as search continues; in practice, current estimators use nonparametric assumptions that, while robust, can not extrapolate beyond the current search step. Such extrapolation requires stronger assumptions—realistic assumptions designed for hyperparameter tuning. Thus, we derive an asymptotic theory of random search. Its central result is a new limit theorem that explains random search in terms of four interpretable quantities: the effective number of hyperparameters, the variance due to random seeds, the concentration of probability around the optimum, and the best hyperparameters' performance. These four quantities parametrize a new probability distribution, *the noisy quadratic*, which characterizes the behavior of random search. We test our theory against three practical deep learning scenarios, including pretraining in vision and fine-tuning in language. Based on 1,024 iterations of search in each, we confirm our theory achieves excellent fit. Using the theory, we construct the first confidence bands that extrapolate the tuning curve. Moreover, once fitted, each parameter of the noisy quadratic answers an important question—such as what is the best possible performance. So others may use these tools in their research, we make them available at (URL redacted).
Primary Area: optimization
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 13008
Loading