Keywords: Syntehthic Data, Finetuning, Tabular Foundation Models
TL;DR: Syntehthic Data Geneartion for Finetuning Tabular Foundation Models
Abstract: Tabular foundation models pre-trained on synthetically generated datasets have exhibited strong in-context learning capabilities.
While fine-tuning can further enhance predictive performance, overfitting to the training data of a downstream task poses a significant risk in tiny-to-small data regimes. We propose a fine-tuning method that employs synthetically generated fine-tuning data to avoid overfitting and improve generalization performance. We study three variants of data generation methods and empirically demonstrate that they mitigate overfitting and outperform standard fine-tuning approaches across five tiny-to-small real-world datasets. Our data generation methods leverage density estimators and structural causal models, akin to those employed during pre-training, to yield the best performance.
Our findings indicate that synthetic data generation, a central element in pre-training, can be successfully adapted to enhance fine-tuning.
Submission Number: 106
Loading