Is Deep Learning finally better than Decision Trees on Tabular Data?

Is Deep Learning finally better than Decision Trees on Tabular Data?

TMLR Paper3828 Authors

02 Dec 2024 (modified: 11 Mar 2025)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Tabular data is a ubiquitous data modality due to its versatility and ease of use in many real-world applications. The predominant heuristics for handling classification tasks on tabular data rely on classical machine learning techniques, as the superiority of deep learning models has not yet been demonstrated. This raises the question of whether new deep learning paradigms can surpass classical approaches. Recent studies on tabular data offer a unique perspective on the limitations of neural networks in this domain and highlight the superiority of gradient boosted decision trees (GBDTs) in terms of scalability and robustness across various datasets. However, novel foundation models have not been thoroughly assessed regarding quality or fairly compared to existing methods for tabular classification. Our study categorizes ten state-of-the-art neural models based on their underlying learning paradigm, demonstrating specifically that meta-learned foundation models outperform GBDTs in small data regimes. Although dataset-specific neural networks generally outperform LLM-based tabular classifiers, they are surpassed by an AutoML library which exhibits the best performance but at the cost of higher computational demands.

Submission Length: Long submission (more than 12 pages of main content)

Changes Since Last Submission: In response to the reviewers' feedback, we have incorporated the following updates: ### 1. Additional Baselines - We added two new baselines: *MLP with PLR embeddings* and *RealMLP*. - Figures **1–4** and **7–15** have been updated to reflect the inclusion of these baselines. - To analyze their hyperparameter configurations, we introduced **Figures 18 and 19** in Sections **D.3** and **D.4**, respectively. - Sections **A.7** and **A.8** were added to present the hyperparameter search spaces of the newly introduced baselines. - We included **Tables 15 and 20** to report the raw results of these baselines, both with and without hyperparameter tuning. ### 2. Modification to Table 1 - **Table 1** was updated to include two additional recent empirical studies. ### 3. Preprocessing Details - Section **A.14** was added to provide an explanation of the preprocessing steps applied to all considered methods. ### 4. Analysis on Dataset Sizes - We introduced **Appendix E**, which thoroughly investigates the performance of different method families across datasets of varying sizes. - Datasets were categorized based on the number of instances and features, and we added **Figures 27–30** along with **Tables 24–32** to support this analysis. - Section **E.6** provides a performance profile comparison between small and large dataset domains, supported by **Figures 31 and 32**. ### 5. Ablation on Refitting - **Appendix F** was added to analyze the impact of refitting. - This ablation study is presented with **Figures 33–35** and **Tables 33–35**. All modifications are highlighted in blue for clarity.

Assigned Action Editor: ~Han-Jia_Ye1

Submission Number: 3828

Loading