Nonstationary Latent Bandits

Nonstationary Latent Bandits

TMLR Paper4805 Authors

09 May 2025 (modified: 13 Aug 2025)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Addressing non-stationarity and latent variables in bandit algorithms presents significant challenges. This paper tackles both challenges simultaneously in Multi-Agent Multi-Armed Bandits by integrating causal inference principles with panel data methodologies. We propose Dynamic Causal Multi-Armed Bandits (DCMAB) and Dynamic Causal Contextual Bandits (DCCB), focusing on treatment effect estimation rather than direct reward modeling. Our algorithms, employing matrix completion on agent-time reward matrices, effectively leverage shared information among agents while adapting to dynamic environments. We establish sub-linear regret for the proposed algorithms and extend their applicability to scenarios with time-varying treatment effects. Through extensive simulations and a real-world application in the stock market, we validate the superiority of our proposed methods in non-stationary bandits with latent variables.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Olivier_Cappé2

Submission Number: 4805

Loading