Regret Lower Bounds for Decentralized Multi-Agent Stochastic Shortest Path Problems

Utkarsh U. Chavan; Prashant Trivedi; Nandyala Hemachandra

Regret Lower Bounds for Decentralized Multi-Agent Stochastic Shortest Path Problems

Utkarsh U. Chavan, Prashant Trivedi, Nandyala Hemachandra

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Lower Bounds, Decentralized MARL, Stochastic Shortest Path, Reinforcement Learning, Feature Design

TL;DR: A first lower bounds for regret of decentralized multi-agent-SSP.

Abstract: Multi-agent systems (MAS) are central to applications such as swarm robotics and traffic routing, where agents must coordinate in a decentralized manner to achieve a common objective. Stochastic Shortest Path (SSP) problems provide a natural framework for modeling decentralized control in such settings. While the problem of learning in SSP has been extensively studied in single-agent settings, the decentralized multi-agent variant remains largely unexplored. In this work, we take a step towards addressing that gap. We study decentralized multi-agent SSPs (Dec-MASSPs) under linear function approximation, where the transition dynamics and costs are represented using linear models. Applying novel symmetry-based arguments, we identify the structure of optimal policies. Our main contribution is the first regret lower bound for this setting based on the construction of hard-to-learn instances for any number of agents, $n$. Our regret lower bound of $\Omega(\sqrt{K})$, over $K$ episodes, highlights the inherent learning difficulty in Dec-MASSPs. These insights clarify the learning complexity of decentralized control and can further guide the design of efficient learning algorithms in multi-agent systems.

Primary Area: Reinforcement learning (e.g., decision and control, planning, hierarchical RL, robotics)

Submission Number: 13417

Loading