Submission Type: Short paper (4 pages)
Keywords: Tabular Foundation Models, Fine-tuning, synthetic data, in-context learning, structural causal models
TL;DR: CausalMixFT improves fine-tuning of tabular foundation models under data scarcity by generating causally consistent synthetic data, boosting performance and validation reliability compared to existing augmentation methods.
Abstract: Fine-tuning tabular foundation models (TFMs) in the face of scarce data is challenging, as early stopping on even scarcer validation data often fails to capture true
generalization performance. We propose CausalMixFT, a method that enhances
fine-tuning robustness and downstream performance by generating structurally
consistent synthetic samples using Structural Causal Models (SCMs) fitted on the
target dataset. This approach augments limited real data with causally informed
synthetic examples, preserving feature dependencies while expanding training
diversity. Evaluated across 33 classification datasets from TabArena and over 2,300
fine-tuning runs, our CausalMixFT method consistently improves the improvement
of median normalized ROC-AUC by fine-tuning from 0.10 (standard fine-tuning) to
0.12, outperforming purely statistical generators such as CTGAN (-0.01), TabEBM
(-0.04), and TableAugment (-0.09). Moreover, it narrows the median validation-test
performance correlation gap from 0.67 to 0.30, enabling more reliable validation-based early stopping—a key step toward improving fine-tuning stability under data scarcity. These results demonstrate that incorporating causal structure into
data augmentation provides an effective and principled route to fine-tuning tabular
foundation models in low-data regimes.
Submission Number: 19
Loading