When Synthetic Data Is Enough: Calibration for Tabular Model Ranking

Gennadii Filatov; Irina Deeva

When Synthetic Data Is Enough: Calibration for Tabular Model Ranking

Gennadii Filatov, Irina Deeva

Published: 13 Apr 2026, Last Modified: 13 Apr 2026Calibration for Modern AI @ AISTATS 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Synthetic Data, Tabular Data, Synthetic Validation, Calibration, Ranking Fidelity

TL;DR: We propose a loss-based calibration method that reweights synthetic tabular data to reliably preserve model rankings for selection, significantly improving synthetic validation fidelity across classification and regression tasks.

Abstract: Model selection in tabular machine learning typically relies on held-out validation or cross-validation, which reduces effective training data and increases computational cost. Recent work suggests that synthetic data can serve as a validation surrogate if they preserve the ranking of candidate models, but generator imperfections in tabular domains often make uncalibrated synthetic evaluation unreliable. We propose a simple, task-agnostic calibration procedure for synthetic tabular validation: given a synthetic validation set and a small calibration subset of models, we learn sample weights by aligning weighted synthetic losses with cross-fitted real-data loss estimates, yielding a calibrated synthetic risk for ranking and selection. The method generalizes correctness-based schemes to arbitrary losses and supports both classification and regression. Across five classification and five regression benchmarks and multiple generators (CTGAN, TVAE, Gaussian Copula, TabDDPM, TabPFN-based generation), calibration consistently improves ranking fidelity measured by Spearman correlation of synthetic and real model orderings, with large gains when generators are weaker. Finally, we study weight interpretability via a surrogate regressor to predict weights from sample values and analyzing SHAP attributions, revealing that weight assignment is driven primarily by the synthetic target and a few task-relevant covariates, providing insight into when and why calibration succeeds.

Submission Number: 14

Loading