YAHPO Gym - An Efficient Multi-Objective Multi-Fidelity Benchmark for Hyperparameter Optimization

Florian Pfisterer; Lennart Schneider; Julia Moosbauer; Martin Binder; Bernd Bischl

YAHPO Gym - An Efficient Multi-Objective Multi-Fidelity Benchmark for Hyperparameter Optimization

Florian Pfisterer, Lennart Schneider, Julia Moosbauer, Martin Binder, Bernd Bischl

Published: 16 May 2022, Last Modified: 05 May 2023AutoML-Conf 2022 (Main Track)Readers: Everyone

Abstract: When developing and analyzing new hyperparameter optimization methods, it is vital to empirically evaluate and compare them on well-curated benchmark suites. In this work, we propose a new set of challenging and relevant benchmark problems motivated by desirable properties and requirements for such benchmarks. Our new surrogate-based benchmark collection consists of 14 scenarios that in total constitute over 700 multi-fidelity hyperparameter optimization problems, which all enable multi-objective hyperparameter optimization. Furthermore, we empirically compare surrogate-based benchmarks to the more widely-used tabular benchmarks, and demonstrate that the latter may produce unfaithful results regarding the performance ranking of HPO methods. We examine and compare our benchmark collection with respect to defined requirements and propose a single-objective as well as a multi-objective benchmark suite on which we compare 7 single-objective and 7 multi-objective optimizers in a benchmark experiment. Our software is available at https://github.com/slds-lmu/yahpo_gym.

Keywords: HPO, optimization, benchmarking, multi-fidelity, multi-objective, NAS

One-sentence Summary: We propose a new benchmark collection for multi-fidelity and multi-objective HPO.

Track: Special track for systems, benchmarks and challenges

Reproducibility Checklist: Yes

Broader Impact Statement: Yes

Paper Availability And License: Yes

Code Of Conduct: Yes

Reviewers: Florian Pfisterer is already reviewer. Bernd Bischl is already senior area chair.

CPU Hours: 3349

GPU Hours: 1080

TPU Hours: 0

Evaluation Metrics: Yes

Main Paper And Supplementary Material: pdf

Steps For Environmental Footprint Reduction During Development: Instead of tuning and fitting surrogate models for each individual benchmark problem, we fit our surrogates on whole benchmark scenarios (multiple instances and multiple targets).

Estimated CO2e Footprint: 381

8 Replies

Loading