TL;DR: This paper constructs a systematic framework for tackling interference in contextual bandit problems with multiple units per round, offering comprehensive theoretical guarantees and connecting online learning with causality.
Abstract: Interference, a key concept in causal inference, extends the reward modeling process by accounting for the impact of one unit's actions on the rewards of others. In contextual bandit (CB) settings where multiple units are present in the same round, interference can significantly affect the estimation of expected rewards for different arms, thereby influencing the decision-making process. Although some prior work has explored multi-agent and adversarial bandits in interference-aware settings, how to model interference in CB remains significantly underexplored. In this paper, we introduce a systematic framework to address interference in Linear CB (LinCB), bridging the gap between causal inference and online decision-making. We propose a series of algorithms that explicitly quantify the interference effect in the reward modeling process and provide comprehensive theoretical guarantees, including sublinear regret bounds, finite sample upper bounds, and asymptotic properties. The effectiveness of our approach is demonstrated through simulations and a synthetic data generated based on MovieLens data.
Lay Summary: Ever noticed how watching a movie with someone can change how much you enjoy it? Or how one person’s choice to quarantine can affect the health of those around them? These kinds of ripple effects, known as "interference" in causal inference, are everywhere in real life.
In many real-world scenarios like these, existing methods don’t make decisions with the goal of maximizing overall outcomes. Instead, they often handle individuals one at a time and ignore how people might influence each other. Even more, these methods often fall short in capturing the personalized differences that matter in decision-making.
We set out to change that. Our research introduces a simple yet effective structure to capture interference effects, allowing long-term learning systems like bandits to adapt more intelligently as they collect experience. Through rigorous theoretical analysis, we show that our approach is not only intuitive but also statistically grounded, with clear uncertainty quantification and strong performance guarantees.
Our findings show that this broader framework consistently outperforms traditional linear contextual bandit methods. It enables more coordinated, robust, and effective decision-making across systems where individuals interact and affect one another.
Link To Code: https://github.com/YangXU63/LinCBWI
Primary Area: General Machine Learning->Causality
Keywords: Contextual Bandits, Interference, Causality, Multi-agent, Asymptotics, Sublinear regret
Flagged For Ethics Review: true
Submission Number: 4971
Loading