Towards Synthetic Data for Fine-tuning Tabular Foundation Models

Published: 09 Jun 2025, Last Modified: 09 Jun 2025FMSD @ ICML 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Syntehthic Data, Finetuning, Tabular Foundation Models
TL;DR: Syntehthic Data Geneartion for Finetuning Tabular Foundation Models
Abstract: Tabular foundation models pre-trained on synthetically generated datasets have exhibited strong in-context learning capabilities. While fine-tuning can further enhance predictive performance, overfitting to the training data of a downstream task poses a significant risk in tiny-to-small data regimes. We propose a fine-tuning method that employs synthetically generated fine-tuning data to avoid overfitting and improve generalization performance. We study three variants of data generation methods and empirically demonstrate that they mitigate overfitting and outperform standard fine-tuning approaches across five tiny-to-small real-world datasets. Our data generation methods leverage density estimators and structural causal models, akin to those employed during pre-training, to yield the best performance. Our findings indicate that synthetic data generation, a central element in pre-training, can be successfully adapted to enhance fine-tuning.
Submission Number: 106
Loading