Keywords: Foundation Models, TabPFN, Tabular data, Model Robustness, Noise induction
TL;DR: Assessing the Robustness of Tabular Prior-Data Fitted Network Classifier
Abstract: Label noise is a common and critical challenge in real-world machine learning, especially in tabular data settings where mislabeled instances can severely degrade model performance and generalization. The proposed study investigates the robustness of the Tabular Prior-Data Fitted Network (TabPFN), a transformer-based model under varying levels of label noise in binary classification tasks. Using 15 publicly available tabular datasets from OpenML, we systematically inject label noise at multiple levels (0%, 1%, 5%, 10%, 20%, 25%, and 30%) and evaluate TabPFN against seven traditional classifiers, including Random Forest (RF), Extreme Gradient Boosting (XGBoost), Light GBM (LGBM), Support Vector Machine (SVM), K-Nearest Neighbor (kNN), Category Boosting (CatBoost), and Decision Tree (DT). All models are assessed using 2×5-fold stratified cross-validation, and their performance is reported in terms of average accuracy and AUC-ROC. Our experimental results reveal clear performance trends across classifier types. Boosting-based models are most sensitive to label noise. RF demonstrates moderate robustness and maintains relatively stable performance across noise levels. In contrast, TabPFN consistently exhibits superior resilience to noise. These findings confirm the potential of TabPFN as a robust and noise-tolerant solution for real-world tabular classification tasks.
Submission Number: 20
Loading