TL;DR: In this paper, we demonstrate that in low-data regimes, merely choosing the early stopping point based on validation performance can significantly overestimate model performance.
Abstract: Cross-validation is commonly used to estimate machine learning model performance on new samples. However, using it for both hyperparameter selection and error estimation can lead to overestimating model performance, especially with extensive hyperparameter searches that overly tailor models to validation data. We demonstrate that deep learning further amplifies this bias, with even minor model adjustments causing significant overestimation. Our extensive experiments on simulated and real data focus on the bias from early stopping during cross-validation. We find that overestimation intensifies with network depth and is especially severe in small datasets, which are common in physiological signal processing applications.
Selecting the early stopping point during cross-validation can result in ROC-AUC estimates exceeding 90\% on random data, and this effect persists across various sample sizes, architectures, and network sizes.
Style Files: I have used the style files.
Submission Number: 14
Loading