Towards Model Selection using Learning Curve Cross-ValidationDownload PDF

Published: 14 Jul 2021, Last Modified: 05 May 2023AutoML@ICML2021 PosterReaders: Everyone
Keywords: model selection, algorithm selection, learning curves
TL;DR: We propose a novel model selection technique using learning curves, that based on a convexity assumption selects which models to drop. This methods results in runtime reductions, and can be used orthogonal to other advances in AutoML
Abstract: Cross-validation (CV) methods such as leave-one-out cross-validation, k-fold cross-validation, and Monte-Carlo cross-validation estimate the predictive performance of a learner by repeatedly training it on a large portion of the given data and testing on the remaining data. These techniques have two drawbacks. First, they can be unnecessarily slow on large datasets. Second, providing only point estimates, they give almost no insights into the learning process of the validated algorithm. In this paper, we propose a new approach for validation based on learning curves (LCCV). Instead of creating train-test splits with a large portion of training data, LCCV iteratively increases the number of training examples used for training. In the context of model selection, it eliminates models that can be safely dismissed from the candidate pool. We run a large scale experiment on the 67 datasets from the AutoML benchmark, and empirically show that LCCV in over 90\% of the cases leads to similar performance (at most 0.5\% difference) as 10-fold CV, but provides additional insights on the behaviour of a given model. On top of this, LCCV results in runtime reductions between 20% and over 50% on half of the 67 datasets from the AutoML benchmark. This can be incorporated in various AutoML frameworks, to speed up the internal evaluation of candidate models. As such, these results can be used orthogonal to other advances in the field of AutoML.
Ethics Statement: AutoML methods can be used to exacerbate unfair bias. This method aims to speed up algorithm selection, which is a component of AutoML.
Crc Pdf: pdf
Poster Pdf: pdf
Original Version: pdf
4 Replies

Loading