HPO-B: A Large-Scale Reproducible Benchmark for Black-Box HPO based on OpenML Download PDF

Published: 11 Oct 2021, Last Modified: 20 Oct 2024NeurIPS 2021 Datasets and Benchmarks Track (Round 2)Readers: Everyone
Keywords: Meta-dataset, Hyperparameter Optimization, OpenML, Transfer-learning, Meta-learning
TL;DR: A Large-Scale Reproducible Benchmark for Black-Box HPO based on OpenML
Abstract: Hyperparameter optimization (HPO) is a core problem for the machine learning community and remains largely unsolved due to the significant computational resources required to evaluate hyperparameter configurations. As a result, a series of recent related works have focused on the direction of transfer learning for quickly fine-tuning hyperparameters on a dataset. Unfortunately, the community does not have a common large-scale benchmark for comparing HPO algorithms. Instead, the de facto practice consists of empirical protocols on arbitrary small-scale meta-datasets that vary inconsistently across publications, making reproducibility a challenge. To resolve this major bottleneck and enable a fair and fast comparison of black-box HPO methods on a level playing field, we propose HPO-B, a new large-scale benchmark in the form of a collection of meta-datasets. Our benchmark is assembled and preprocessed from the OpenML repository and consists of 176 search spaces (algorithms) evaluated sparsely on 196 datasets with a total of 6.4 million hyperparameter evaluations. For ensuring reproducibility on our benchmark, we detail explicit experimental protocols, splits, and evaluation measures for comparing methods for both non-transfer, as well as, transfer learning HPO.
Supplementary Material: zip
URL: https://github.com/releaunifreiburg/HPO-B
Contribution Process Agreement: Yes
Dataset Url: https://github.com/releaunifreiburg/HPO-B
License: The dataset is released under license CC-BY.
Author Statement: Yes
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/hpo-b-a-large-scale-reproducible-benchmark/code)
8 Replies

Loading