No-Regret Bandit Exploration based on Soft Tree Ensemble Model

Shogo Iwazaki; Shinya Suzumura

No-Regret Bandit Exploration based on Soft Tree Ensemble Model

Shogo Iwazaki, Shinya Suzumura

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: neural bandits; tree ensemble model; kernel bandits;

Abstract: We propose a novel stochastic bandit algorithm that employs reward estimates using a tree ensemble model. Specifically, our focus is on a soft tree model, a variant of the conventional decision tree that has undergone both practical and theoretical scrutiny in recent years. By deriving several non-trivial properties of soft trees, we extend the existing analytical techniques used for neural bandit algorithms to our soft tree-based algorithm. We demonstrate that our algorithm achieves a smaller cumulative regret compared to the existing ReLU-based neural bandit algorithms. We also show that this advantage comes with a trade-off: the hypothesis space of the soft tree ensemble model is more constrained than that of a ReLU-based neural network.

Primary Area: Bandits

Submission Number: 13365

Loading