A Bayesian Multi-agent Multi-arm Bandit Framework for Optimal Decision Making in Dynamically Changing Environments

A Bayesian Multi-agent Multi-arm Bandit Framework for Optimal Decision Making in Dynamically Changing Environments

ICLR 2026 Conference Submission19963 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-Armed Bandits; Decision-Making; Multi-Agent Systems; Bayesian Inference

Abstract: We introduce DAMAS (Dynamic Adaptation through Multi-Agent Systems), a novel framework for decision-making in non-stationary environments characterized by varying reward distributions and dynamic constraints. Our framework integrates a multi-agent system with Multi-Armed Bandit (MAB) algorithms and Bayesian updates, enabling each agent to specialize in a particular environmental state. DAMAS continuously estimates the probability of being in each state using only reward observations, allowing rapid adaptation to changing conditions without the need for explicit context features. Our evaluation of DAMAS included both synthetic environments and real-world web server workloads. Our results show that DAMAS outperforms state-of-the-art methods, reducing regret by around 40% and achieving a higher probability of selecting the best action.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 19963

Loading