TabPFN for Data-Scarce Industrial Settings

Published: 18 Nov 2025, Last Modified: 18 Nov 2025AITD@EurIPS 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Submission Type: Short paper (4 pages)
Keywords: TabPFN, industrial setting, data scarcity
TL;DR: We analyze sample size effects on TabPFN performance in industrial settings and compare to strong tabular data baseline models.
Abstract: Tabular foundation models such as TabPFN v2 perform in-context learning by conditioning on a small labeled support set and a query instance, enabling fast adaptation to heterogeneous tabular regression tasks without per-dataset training. Many industrial applications operate in a tiny-sample regime due to cost, and process constraints. We analyze TabPFN under extreme label scarcity for regression, positioning it against established tabular baselines and tracing dataset-size–dependent predictive performance. Our study analyzes sample sizes from 5 labeled points per task, including an industrial steelmaking regression problem and public benchmarks. In steelmaking, in-process target measurements are rarely feasible, with intermediate targets embedded in delayed end-of-process data. Since data collection is slow and scarce, effective use requires integrating heterogeneous datasets across vessels, processes, and plants. A central finding is that the TabPFN support set size dependency varies widely with dataset quality and information content. While most benchmark tasks achieve satisfactory performance beyond support set sizes of 20, the investigated industrial datasets require at least 100 samples to consistently outperform a naive mean baseline. We discuss implications for deploying in-context tabular models in the low-data regime and show dataset size dependencies for various competitive tabular regression methods.
Submission Number: 12
Loading