Abstract: Introduction Research in autonomous agent planning is gradually moving from single-agent environments to those populated by multiple agents. In single-agent sequential environments, partially observable Markov decision processes (POMDPs) provide a principled approach for planning under uncertainty. They improve on classical planning by not only modeling the inherent non-determinism of the problem domain, but also by producing ”universal” plans or policies which represent complete control mechanisms. We are motivated by these reasons to generalize POMDPs from their traditional single-agent application setting to an environment populated by several interacting autonomous agents. The formalism of Markov decision processes has been extended to multiple agents previously, giving rise to stochastic games or Markov games. Other extensions of POMDPs to multiple agent environments have also appeared and are called DEC-POMDPs (Bernstein et al. 2002) in the literature. Both these formalisms employ the solution concept of Nash equilibria. Specifically, solutions are plans (policies) that are in mutual equilibrium with each other. However, while Nash equilibria are useful for describing a multiagent system when, and if, it has reached a stable state, this solution concept is not sufficient as a general control paradigm. The main reasons are that there may be multiple equilibria with no clear way to choose among them (nonuniqueness), and the fact that equilibria do not specify actions in cases in which agents believe that other agents may not act according to their equilibrium strategies (incompleteness). Furthermore, at present, researchers have inadequate understanding of the intermediate stages before Nash equilibrium is reached. In this thesis, we present a new framework called Interactive POMDPS (I-POMDPs) for optimal planning by an agent interacting with other autonomous agents in a sequential environment and maximizing its reward that depends on joint actions of all agents. As expected, the generalization of POMDPs from a single-agent setting to multiple agents is not trivial. In addition to maintaining beliefs about the physical environment, each agent must also maintain beliefs about the other agents: their sensing capabilities, be-
0 Replies
Loading