Solving zero-sum one-sided partially observable stochastic games

Published: 01 Jan 2023, Last Modified: 14 Oct 2024Artif. Intell. 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Many real-world situations are dynamic, with long-term interactions between multiple agents with uncertainty and limited observations. The agents must reason about which actions to take while also predicting and learning about what actions the other agents will take and how their choices will interact. In the most general setting, there is no limitation on the length of the sequence of actions the agent can perform — that is, there is no fixed horizon that can be used as an endpoint for analysis. These settings can be modeled as partially observable stochastic games (POSGs). Many adversarial domains (e.g., security settings) can be modeled as strictly competitive (or zero-sum) variants of these games.While these models are capable of modeling a wide variety of realistic problems, solving general POSGs is computationally intractable, so we focus on a broad subclass of POSGs called one-sided POSGs. In these games, only one agent has imperfect information while their opponent has full knowledge of the current situation. We provide a complete approach for solving zero-sum, one-sided POSGs: we (1) give a theoretical analysis of one-sided POSGs and their value functions, (2) show that a variant of a value-iteration algorithm converges in this setting, (3) adapt the heuristic search value-iteration algorithm for solving one-sided POSGs, (4) describe how to use approximate value functions to derive strategies in the game, and (5) experimentally demonstrate that our algorithm can solve one-sided POSGs of non-trivial sizes and analyze the scalability of our algorithm in three different domains: pursuit-evasion, patrolling, and search games.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview