Keywords: Tabular data, machine learning, weakly supervised learning
Abstract: Tabular data plays a vital role in a wide range of real-world applications.
However, previous methods for tabular data learning primarily focused on closed environments, overlooking the fact that feature missingness and distributional shift issues can occur simultaneously in open environments.
In this paper, we first investigate \textbf{R}obust \textbf{T}abular prediction under the \textbf{C}oupled \textbf{S}hifts of feature missingness and distributional change, namely \setting problem.
We identify three challenges in \setting, where column missingness and distribution shifts are interdependent and mutually inhibitive: (1) the coexistence of column missingness and distribution shifts leads to severe performance degradation, for which no effective solutions currently exist; (2) under distribution shifts, it is inherently difficult to obtain reliable statistical patterns for imputing missing features; and (3) mitigating information loss from missing features while maintaining robustness to distribution shifts remains highly challenging.
To this end, we propose \textbf{K}nowledge-\textbf{G}uided \textbf{C}oupled \textbf{S}hift handler for \textbf{Tab}ular data, namely \algo,
which effectively disentangles feature missingness from distribution shifts by performing column imputation by constructing Knowledge-Guided recovery rules, and adapts to unknown distributions through model selection with theoretical guarantee. Experimental results demonstrate that \algo achieves a nearly 20\% performance gain.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 1424
Loading