Gradient Boosting Reinforcement Learning

Benjamin Fuhrer; Chen Tessler; Gal Dalal

Gradient Boosting Reinforcement Learning

Benjamin Fuhrer, Chen Tessler, Gal Dalal

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0

Abstract: We present Gradient Boosting Reinforcement Learning (GBRL), a framework that adapts the strengths of Gradient Boosting Trees (GBT) to reinforcement learning (RL) tasks. While neural networks (NNs) have become the de facto choice for RL, they face significant challenges with structured and categorical features and tend to generalize poorly to out-of-distribution samples. These are challenges for which GBTs have traditionally excelled in supervised learning. However, GBT's application in RL has been limited. The design of traditional GBT libraries is optimized for static datasets with fixed labels, making them incompatible with RL's dynamic nature, where both state distributions and reward signals evolve during training. GBRL overcomes this limitation by continuously interleaving tree construction with environment interaction. Through extensive experiments, we demonstrate that GBRL outperforms NNs in domains with structured observations and categorical features, while maintaining competitive performance on standard continuous control benchmarks. Like its supervised learning counterpart, GBRL demonstrates superior robustness to out-of-distribution samples and better handles irregular state-action relationships.

Lay Summary: Imagine you're teaching a computer to make decisions by learning from trial and error, like learning to play a video game or manage a business. Most artificial intelligence systems today use neural networks, but these struggle with structured data that has clear patterns, like spreadsheets with categories and numbers. Decision trees, which work like "if-then" questions, excel at handling this type of data in traditional machine learning. Our work adapts decision trees for AI agents that learn through trial and error. Instead of neural networks, our system builds an ensemble of decision trees that collectively learn optimal strategies. Testing across various tasks, we found our approach matches neural networks in standard scenarios but significantly outperforms them with structured data. Most importantly, our method proved much more robust when faced with unexpected situations, noisy data, or misleading information—critical advantages for real-world applications where conditions are unpredictable.

Link To Code: https://github.com/NVlabs/gbrl;https://github.com/NVlabs/gbrl_sb3

Primary Area: Reinforcement Learning

Keywords: Gradient Boosting Trees, Reinforcement Learning, Machine Learning

Submission Number: 1505

Loading