NRGBoost: Energy-Based Generative Boosted Trees

15 May 2024 (modified: 06 Nov 2024)Submitted to NeurIPS 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Generative Models, Energy-Based Models, Gradient Boosting, Tabular Data
Abstract: Despite the rise to dominance of deep learning in unstructured data domains, tree-based methods such as Random Forests (RF) and Gradient Boosted Decision Trees (GBDT) are still the workhorses for handling discriminative tasks on tabular data. We explore generative extensions of these popular algorithms with a focus on explicitly modeling the data density (up to a normalization constant), thus enabling other applications besides sampling. As our main contribution we propose an effective energy-based generative boosting algorithm that is analogous to the second order boosting algorithm implemented in popular packages like XGBoost. We show that, despite producing a generative model capable of handling inference tasks over any input variable, our proposed algorithm can achieve similar discriminative performance to GBDT algorithms on a number of real world tabular datasets and outperform competing approaches for sampling.
Primary Area: Generative models
Submission Number: 15154
Loading