Utilizing TabPFN for Multi-Instance Data with Scarce Labels

Published: 18 Nov 2025, Last Modified: 18 Nov 2025AITD@EurIPS 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Submission Type: Short paper (4 pages)
Keywords: TabPFN, HDLSS, clustering
TL;DR: We propose a cluster-based preprocessing scheme for TabPFN, which allows for inference on high-dimensional multi-instance inputs..
Abstract: Tabular data is abundant in critical applications such as science, healthcare, finance, energy, and many other industries, making advances in tabular learning highly influential and interesting for the research community. However, in heavy industry applications we are often presented with a special class of tabular regression problems which are not commonly studied. These multi-instance single-target tabular data problems, originate from the difficulty and cost of taking regular measurements during a production process. In this setting, we have to deal with high-dimensional inputs, in combination with scarce labels. While foundation models such as TabPFN show strong results on suitable datasets, their applicability and performance on multi-instance single-target data is limited by memory and runtime constraints when the number of instances grows. In this paper, we propose a cluster-based dimensionality reduction, which compresses multi-instance measurements by splitting them according to the most relevant cluster constructed from the training set. This approach reduces computational overhead while preserving predictive performance, enabling inference for multi-instance datasets. Our experiments demonstrate that the proposed method extends the practical reach of TabPFN, achieving improved performance across multiple datasets.
Relevance Comments: Our paper proposes a method that makes TabPFN applicable to a larger class of tabular data tasks.
Submission Number: 51
Loading