MISS: An Incomplete Tabular Data Representation System with Missing Mechanism Learning

Published: 2025, Last Modified: 23 Jan 2026ICDE 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The missing data problem widely exists in real-life scenarios. The incomplete data analysis through imputation can amplify the errors or bias, hindering the effective analysis. Ex-isting tabular data representation methods overlook the missing state of data values, and thus cannot effectively deal with the incomplete data. In this paper, we propose a novel incomplete tabular data representation system, named MISS. It is capable of enabling all Transformer-based tabular representation methods to effectively handle incomplete data. MISS consists of two modules, i.e., missing mechanism learning (MML) and incomplete data representation (IDR). MML leverages a new missingness propensity score calculation strategy to learn the observed data distribution and missing mechanisms within incomplete data. IDR introduces a novel probability-driven Transformer block, in conjunction with an unbiased representation loss function, for effective representation. We prove that, MISS can eliminate the bias resulting from missingness. Extensive experiments on four public real-world datasets demonstrate that, MISS yields a more than 57 % accuracy gain with competitive efficiency, compared with the state-of-the-art approaches.
Loading