Bandit Learning on Dynamic Graphs

Amit Kiran Rege; Sourav Chakraborty; Lijun Chen; Claire Monteleoni

Bandit Learning on Dynamic Graphs

Amit Kiran Rege, Sourav Chakraborty, Lijun Chen, Claire Monteleoni

Published: 23 Sept 2025, Last Modified: 01 Dec 2025ARLETEveryoneRevisionsBibTeXCC BY 4.0

Track: Research Track

Keywords: Multi-Armed Bandits, Online Learning

Abstract: We study an online learning setting where an agent's actions are constrained to local movements on a dynamic graph, a setting that captures scenarios such as autonomous reconnaissance. This problem highlights a core challenge in adaptive systems: how to learn effectively with only partial, localized feedback in a non-stationary environment. We propose a set of structural conditions, termed \textit{Recurrent Reachability} and \textit{Temporal Stability}, that are sufficient for learnability. Our analysis reveals a foundational \textit{anatomy of regret}, decomposing it into a statistical learning cost and a physical navigation cost. We introduce a family of local algorithms, progressing from a canonical protocol to a more practical, adaptive variant, and culminating in a reward-aware exploration policy that achieves provably near-optimal regret on any graph sequence satisfying our conditions. We corroborate our theory in a disaster-response simulation.

Submission Number: 144

Loading