TabImpute: Accurate and Fast Zero-Shot Missing-Data Imputation with a Pre-Trained Transformer

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: missing data, matrix completion, transformer, tabular
TL;DR: We develop a pre-trained transformer called TabImpute to produce accurate and fast zero-shot imputations for tabular missing data.
Abstract: Missing data is a pervasive problem in tabular settings. Existing solutions range from simple averaging to complex generative adversarial networks, but due to each method’s large variance in performance across real-world domains and time-consuming hyperparameter tuning, no default imputation method exists. Building on TabPFN, a recent tabular foundation model for supervised learning, we propose TabImpute, a pre-trained transformer that delivers accurate and fast zero-shot imputations requiring no fitting or hyperparameter tuning at inference-time. To train and evaluate TabImpute, we introduce (i) an entry-wise featurization for tabular settings, which enables a $100\times$ speedup over the previous TabPFN imputation method, (ii) a synthetic training data generation pipeline incorporating realistic missingness patterns, and (iii) MissBench, a comprehensive benchmark with $42$ OpenML datasets and $13$ new missingness patterns. MissBench spans domains such as medicine, finance, and engineering, showcasing TabImpute’s robust performance compared to $12$ established imputation methods.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 7944
Loading