everyone
since 13 Oct 2023">EveryoneRevisionsBibTeX
While Deep Learning has demonstrated impressive results in applications on various data types, it continues to lag behind tree-based methods when applied to tabular data, often referred to as the last “unconquered castle” for neural networks. We hypothesize that a significant advantage of tree-based methods lies in their intrinsic capability to model and exploit non-linear interactions induced by features with categorical characteristics. In contrast, neural-based methods exhibit biases toward a uniform numerical processing of features and smooth solutions, making it challenging for them to effectively leverage such patterns. We aim to address this performance gap by using simple, statistical-based feature processing techniques to identify and explicitly encode features that are strongly correlated with the target once discretized, as well as mitigate the bias of deep models for overly-smooth solutions, a bias that does not align with the inherent properties of the data, using Learned Fourier Features. Our proposed feature processing and method achieves a performance that closely matches or surpasses XGBoost on a comprehensive tabular data benchmark.