Improving Hyperparameter Optimization with Checkpointed Model Weights

Nikhil Mehta, Jonathan Lorraine, Steve Masson, Ramanathan Arunachalam, Zaid Pervaiz Bhat, James Lucas, Arun George Zachariah

Published: 01 Jan 2025, Last Modified: 03 Nov 2025CrossrefEveryoneRevisionsCC BY-SA 4.0

Abstract: As the scale of foundation models continues to grow, efficient hyperparameter optimization (HPO) becomes increasingly critical to manage the substantial computational resources required for training and downstream usage. Traditional HPO methods are often prohibitively expensive in these scenarios, motivating the need for more sophisticated approaches. Classical methods treat this as a black-box optimization problem. However, gray-box HPO methods, which incorporate more information about the setup, have emerged as a promising direction for more efficient optimization. In this work, we propose an HPO method for neural networks using logged checkpoints of trained weights to guide future hyperparameter selections. Our method, Forecasting Model Search (FMS), embeds weights into a Gaussian process deep kernel surrogate model, using a permutation-invariant graph metanetwork to be data-efficient with logged network weights. We open-source our code (https://github.com/NVlabs/forecasting-model-search).

External IDs:doi:10.1007/978-3-031-91979-4_8