Abstract: Deep tabular models have demonstrated remarkable success on i.i.d. data, excelling in a variety of structured data tasks.
However, their performance often deteriorates under temporal distribution shifts, where trends and periodic patterns are present in the evolving data distribution over time.
In this paper, we explore the underlying reasons for this failure in capturing temporal dependencies.
We begin by investigating the training protocol, revealing a key issue in how the data is split for model training and validation.
While existing approaches typically use temporal ordering for splitting, we show that even a random split significantly improves model performance.
By accounting for reducing training lag and validation bias to achieve better generalization ability, our proposed splitting protocol offers substantial improvements across a variety of methods.
Furthermore, we analyses how temporal data affects deep tabular representations, uncovering that these models often fail to capture crucial periodic and trend information.
To address this gap, we introduce a plug-and-play temporal embedding based on Fourier series expansion to learn and incorporate temporal patterns, offering an adaptive approach to handle temporal shifts.
Our experiments demonstrate that this temporal embedding, combined with the improved splitting strategy, provides a more effective and robust framework for learning from temporal tabular data.
Lay Summary: This paper aims to advance the field of machine learning by addressing the critical challenge of temporal distribution shifts in tabular data, which frequently occur in real-world applications. The proposed temporal training protocol and temporal embedding method offer practical improvements for deploying existing tabular models in open environments.
Link To Code: https://github.com/LAMDA-Tabular/Tabular-Temporal-Shift
Primary Area: Deep Learning
Keywords: Machine learning on tabular data, Temporal shift, Deep tabular learning
Submission Number: 10139
Loading