carps: A Framework for Comparing N Hyperparameter Optimizers on M Benchmarks

TMLR Paper5243 Authors

29 Jun 2025 (modified: 03 Sept 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Hyperparameter Optimization (HPO) is crucial to develop well-performing machine learning models. In order to ease prototyping and benchmarking of HPO methods, we propose carps, a benchmark framework for Comprehensive Automated Research Performance Studies allowing to evaluate N optimizers on M benchmark tasks. In this first release of carps, we focus on the four most important types of HPO task types: blackbox, multi-fidelity, multi-objective and multi-fidelity-multi-objective. With 3 336 tasks from 5 community benchmark collections and 28 variants of 9 optimizer families, we offer the biggest go-to library to date to evaluate and compare HPO methods. The carps framework relies on a purpose-built, lightweight interface, gluing together optimizers and benchmark tasks. It also features an analysis pipeline, facilitating the evaluation of optimizers on benchmarks. However, navigating a huge number of tasks while developing and comparing methods can be computationally infeasible. To address this, we obtain a subset of representative tasks by minimizing the star discrepancy of the subset, in the space spanned by the full set. As a result, we propose an initial subset of 10 to 30 diverse tasks for each task type, and include functionality to re-compute subsets as more benchmarks become available, enabling efficient evaluations. We also establish a first set of baseline results on these tasks as a measure for future comparisons. With carps (https://anonymous.4open.science/r/CARP-S-860C), we make an important step in the standardization of HPO evaluation.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: - Addressed reviewer HeNc26's comments by adding more explanations and details. - We have updated carps to support the new SyneTune API, which meant removing the SyneTune optimizers BO, KDE, SyncMOBSTER, DEHB, BO-MO-LS, BO-MO-RS, and adding TPE, BOHB, ASHA, CQR, BoTorch, TPE. The benchmark results have been updated accordingly and do not change the insights from before.
Assigned Action Editor: ~Kevin_Swersky1
Submission Number: 5243
Loading