Policy Regret Minimization in Partially Observable Markov Games

Policy Regret Minimization in Partially Observable Markov Games

ICLR 2026 Conference Submission21484 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Partially Observable Markov Games, Policy Regret, Weakly-Revealing, Observable Operator Model

TL;DR: Policy Regret Minimization in Partially Observable Markov Games

Abstract: We study policy regret minimization in partially observable Markov games (POMGs) between a learner and a strategic adaptive opponent who adapts to the learner's past strategies. We develop a model-based optimistic framework that operates on the learner-observable process using \emph{joint} MLE confidence set and introduce an Observable Operator Model-based causal decomposition that disentangles the coupling between the world and the adversary model. Under multi-step weakly revealing observations and a bounded-memory, stationary and posterior-Lipschitz opponent, we prove an $\mathcal{O}(\sqrt{T})$ policy regret bound. This work advances regret analysis from Markov games to POMGs and provides the first policy regret guarantee under imperfect information against an adaptive opponent.

Primary Area: learning theory

Submission Number: 21484

Loading