Closing the gap on tabular data with Fourier and Implicit Categorical Features

Marius Dragoi; Florin Gogianu; Elena Burceanu

Closing the gap on tabular data with Fourier and Implicit Categorical Features

Marius Dragoi, Florin Gogianu, Elena Burceanu

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: general machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: tabular data, neural networks, feature processing, deep learning, tree-based methods, xgboost

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Abstract:

While Deep Learning has demonstrated impressive results in applications on various data types, it continues to lag behind tree-based methods when applied to tabular data, often referred to as the last “unconquered castle” for neural networks. We hypothesize that a significant advantage of tree-based methods lies in their intrinsic capability to model and exploit non-linear interactions induced by features with categorical characteristics. In contrast, neural-based methods exhibit biases toward a uniform numerical processing of features and smooth solutions, making it challenging for them to effectively leverage such patterns. We aim to address this performance gap by using simple, statistical-based feature processing techniques to identify and explicitly encode features that are strongly correlated with the target once discretized, as well as mitigate the bias of deep models for overly-smooth solutions, a bias that does not align with the inherent properties of the data, using Learned Fourier Features. Our proposed feature processing and method achieves a performance that closely matches or surpasses XGBoost on a comprehensive tabular data benchmark.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 8040

Loading