TabRepo: A Large Scale Repository of Tabular Model Evaluations and its AutoML Applications

David Salinas; Nick Erickson

TabRepo: A Large Scale Repository of Tabular Model Evaluations and its AutoML Applications

David Salinas, Nick Erickson

Published: 30 Apr 2024, Last Modified: 01 Aug 2024AutoML 2024 (ABCD Track)EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Tabular prediction, AutoML, transfer learning, Tabulated dataset, portfolio, ensemble

TL;DR: We release a large scale dataset of tabular model predictions which allows to simulate ensembles very cheaply and can be used to improve tabular SOTA by a large margin.

Abstract: We introduce TabRepo, a new dataset of tabular model evaluations and predictions. TabRepo contains the predictions and metrics of 1206 models evaluated on 200 regression and classification datasets. We illustrate the benefit of our datasets in multiple ways. First, we show that it allows to perform analysis such as comparing Hyperparameter Optimization against current AutoML systems while also considering ensembling at no cost by using precomputed model predictions. Second, we show that our dataset can be readily leveraged to perform transfer-learning. In particular, we show that applying standard transfer-learning techniques allows to outperform current state-of-the-art tabular systems in accuracy, runtime and latency.

Submission Checklist: Yes

Broader Impact Statement: Yes

Paper Availability And License: Yes

Code Of Conduct: Yes

Optional Meta-Data For Green-AutoML: This blue field is just for structuring purposes and cannot be filled.

Steps For Environmental Footprint Reduction During Development: Using precomputed results.

CPU Hours: 220000

GPU Hours: 2000

Evaluation Metrics: Yes

Community Implementations: https://github.com/autogluon/tabrepo

Submission Number: 15

Loading