TabRepo: A Large Scale Repository of Tabular Model Evaluations and its AutoML Applications

Published: 30 Apr 2024, Last Modified: 01 Aug 2024AutoML 2024 (ABCD Track)EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Tabular prediction, AutoML, transfer learning, Tabulated dataset, portfolio, ensemble
TL;DR: We release a large scale dataset of tabular model predictions which allows to simulate ensembles very cheaply and can be used to improve tabular SOTA by a large margin.
Abstract: We introduce TabRepo, a new dataset of tabular model evaluations and predictions. TabRepo contains the predictions and metrics of 1206 models evaluated on 200 regression and classification datasets. We illustrate the benefit of our datasets in multiple ways. First, we show that it allows to perform analysis such as comparing Hyperparameter Optimization against current AutoML systems while also considering ensembling at no cost by using precomputed model predictions. Second, we show that our dataset can be readily leveraged to perform transfer-learning. In particular, we show that applying standard transfer-learning techniques allows to outperform current state-of-the-art tabular systems in accuracy, runtime and latency.
Submission Checklist: Yes
Broader Impact Statement: Yes
Paper Availability And License: Yes
Code Of Conduct: Yes
Optional Meta-Data For Green-AutoML: This blue field is just for structuring purposes and cannot be filled.
Steps For Environmental Footprint Reduction During Development: Using precomputed results.
CPU Hours: 220000
GPU Hours: 2000
Evaluation Metrics: Yes
Community Implementations: https://github.com/autogluon/tabrepo
Submission Number: 15
Loading