Keywords: tabular data, deep learning, uncertainty
TL;DR: We show that many methods in tabular deep learning show their improved performance especially in the zone of high data uncertainty. We study the reasons behind this, and also develop a novel numerical feature embedding method based on our analysis.
Abstract: Recent advancements in tabular deep learning have demonstrated exceptional practical performance, yet the field often lacks a clear understanding of why these techniques actually succeed. To address this gap, our paper highlights the importance of the concept of data uncertainty for explaining the effectiveness of recent tabular DL methods. In particular, we reveal that the success of many beneficial design choices in tabular DL, such as numerical feature embeddings, retrieval-augmented models, and advanced ensembling strategies, can be partially attributed to their implicit mechanisms for performing well under high data uncertainty. By dissecting these mechanisms, we provide a unifying understanding of recent performance improvements. Furthermore, the insights derived from this data-uncertainty perspective directly allowed us to develop more effective numerical feature embeddings as an immediate practical outcome of our analysis. Overall, our work paves the way toward a foundational understanding of the benefits introduced by modern tabular methods that results in the concrete advancements of existing techniques and outlines future research directions for tabular DL.
Supplementary Material: zip
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 7781
Loading