On the Hyperparameter Loss Landscapes of Machine Learning Algorithms

Mingyu Huang; Ke Li

On the Hyperparameter Loss Landscapes of Machine Learning Algorithms

Mingyu Huang, Ke Li

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: visualization or interpretation of learned representations

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Landscape analysis, hyperparameter optimization, exploratory analysis

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Abstract: Hyperparameter optimization (HPO) is often formulated as a black-box, expensive optimization problem. Despite the recent success in a plethora of HPO algorithms, little has been known about the intricate play of model hyperparameters (HPs) and the resulting losses, especially when faced with different scenarios. In this paper, we aim to shed light on this black box by conducting comprehensive fitness landscape anaysis (FLA) on the HP loss landscapes of ML models under $i)$ training and test setups, and different $ii)$ fidelities, $iii)$ datasets, $iv)$ models. We do so by developing a dedicated landscape analysis framework that incorporates a combination of visual and quantitative measures, characterizing both topological structures and configuration rankings of the landscapes. We apply this framework to analyze $1,476$ HP loss landscapes of $5$ ML models, $63$ datasets with over $11$ million model evaluations of different fidelities. Our empirical results reveal a universal picture of HP loss landscapes. In this picture, landscapes feature a fairly smooth and neutral terrain where configurations are clustered with respect to their losses; there is a large plateau consisting of prominent configurations, where the landscape becomes flatter around the optimum. We also show that landscapes of different fidelities, datasets share considerable similarities that can be exploited to accelerate HPO, whereas test landscapes could significantly deviate from training landscapes due to overfitting.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 3833

Loading