Submission Type: Short paper (4 pages)
Keywords: tabular data, biomedical data, dimensionality reduction
TL;DR: cPCA and PCA can boost TabPFN/TabICL performance on biomedical classification tasks
Abstract: Foundation models trained on tabular data, such as TabPFN and TabICL, have demonstrated strong classification performance on synthetic and benchmark datasets. Applications to biological datasets are emerging but remain comparatively underexplored, as such data are often high-dimensional, noisy, and limited in sample size. In this paper, we present the first comprehensive evaluation of TabPFN and TabICL across diverse biology-specific classification tasks, including healthcare, mass spectrometry, gene splicing, and gene expression datasets. Across all datasets, tabular foundation models outperformed traditional classifiers such as XGBoost and SVC, demonstrating strong generalization to biological data. Models trained on raw features resulted in the best accuracies in all datasets except the high-dimensional, noisy gene expression dataset, where TabICL combined with dimensionality reduction (PCA and cPCA) achieved the highest accuracies. These findings provide the systematic evidence that tabular foundation models are effective for biological classification tasks and establish a foundation for their broader application in biomedicine.
Submission Number: 37
Loading