Submission Type: Short paper (4 pages)
Keywords: Tabular foundation models, TabPFN
TL;DR: This is a preview of TabPFN-2.5, a new state of the art tabular foundation model for up to 100k data points.
Abstract: The first tabular foundation model, TabPFN, and its successor TabPFNv2 have
impacted tabular AI substantially, with dozens of methods building on it and
hundreds of applications across different use cases. This paper previews TabPFN-
2.5, the next generation of our tabular foundation model, built for datasets with up
to 50,000 data points and 2,000 features, a 20× increase in data cells compared
to TabPFNv2. TabPFN-2.5 is now the leading method for the industry standard
benchmark TabArena (which contains datasets with up to 100,000 training data
points), substantially outperforming tuned tree-based models and matching the
accuracy of AutoGluon 1.4, a complex four-hour tuned ensemble that even includes
the previous TabPFNv2. Remarkably, default TabPFN-2.5 has a 100% win rate
against default XGBoost on small to medium-sized classification datasets (≤10,000
data points, 500 features) and a 87% win rate on larger datasets up to 100K samples
and 2K features (85% for regression). For production use cases, we introduce
a new distillation engine that converts TabPFN-2.5 into a compact MLP or tree
ensemble, preserving most of its accuracy while delivering orders-of-magnitude
lower latency and plug-and-play deployment. This new release will immediately
strengthen the performance of the many applications and methods already built on
the TabPFN ecosystem.
Submission Number: 47
Loading