Overfitting in Combined Algorithm Selection and Hyperparameter Optimization

Sietse Schröder, Mitra Baratchi, Jan N. van Rijn

Published: 01 Jan 2025, Last Modified: 26 Sept 2025IDA 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Hyperparameter optimization (HPO) aims to design machine learning algorithms that generalize well to unseen data by repeatedly evaluating hyperparameter configurations using a validation procedure. When the validation performance of these configurations is overly optimistic compared to the performance on an unseen test set, this is referred to as meta-overfitting. We decompose meta-overfitting into two types: (i) selection-based and (ii) adaptive overfitting. Selection-based overfitting occurs when testing many configurations, which increases the chance of finding a configuration that performs well on the validation set by chance but performs suboptimal on the test set. Adaptive overfitting arises from advanced HPO methods, such as Bayesian optimization, which iteratively utilize validation results of earlier configurations to propose new configurations increasingly tailored to the specific validation set. We provide one of the largest empirical studies of meta-overfitting in the context of HPO for the Combined Algorithm Selection and Hyperparameter Optimization (CASH) problem, analyzing random search and Bayesian optimization for 48 classification and 16 regression datasets using holdout validation. We show evidence of adaptive overfitting in Bayesian optimization for 41 classification datasets, and consistent with prior work, we show that multiclass datasets are less affected by this phenomenon. Additionally, we find that optimization procedures for regression datasets are surprisingly resilient to adaptive overfitting. Furthermore, we explore the effect of various design choices in the validation procedure (i.e., 10-fold cross-validation and varying holdout set sizes) on meta-overfitting.