Zero-Inflated Bandits

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Many real-world bandit applications are characterized by sparse rewards, which can significantly hinder learning efficiency. Leveraging problem-specific structures for careful distribution modeling is recognized as essential for improving estimation efficiency in statistics. However, this approach remains under-explored in the context of bandits. To address this gap, we initiate the study of zero-inflated bandits, where the reward is modeled using a classic semi-parametric distribution known as the zero-inflated distribution. We develop algorithms based on the Upper Confidence Bound and Thompson Sampling frameworks for this specific structure. The superior empirical performance of these methods is demonstrated through extensive numerical studies.
Lay Summary: Many real-world computer systems need to make smart choices but often receive very little feedback about whether their decisions were good or bad. For example, in online advertising, most customers will not click the advertisement and hence the reward is zero with high probability. Our research introduces a new way to handle situations where feedback is very rare and created new computer algorithms that can learn more effectively in these challenging situations. Our findings have implications for designing better learning systems in such scenarios.
Primary Area: Theory->Online Learning and Bandits
Keywords: Bandits, Zero-Inflated, Exploration
Submission Number: 14608
Loading