H2IL-MBOM: A Hierarchical World Model Integrating Intent and Latent Strategy for Opponent Modeling in Multi-UAV Game
Keywords: Multi-UAV game, Multi-agent reinforcement learning, Opponent modeling, World model
Abstract: In mixed cooperative-competitive scenarios, the uncertain decisions made by agents on both sides not only render learning non-stationary but also pose significant threats to each other's security. Existing methods typically predict policy beliefs based on opponents' actions, goals, and rewards, or predict trajectories and intentions solely from local historical observations. However, the above private information is unavailable and they neglect the intrinsic dynamics between mental states, actions, and trajectories for both sides. Inspired by human cognitive processes, which hierarchically infer intentions from opponents' historical trajectories, reason about latent strategies based on teammates' historical responses, and simulate the co-evolution of mental states and trajectories, we propose a Hierarchical Interactive Intent-Latent-Strategy-Aware World Model-based Opponent Model (H2IL-MBOM) and the Mutual Self-Observed Adversary Reasoning PPO (MSOAR-PPO). These methods enable the recursive and hierarchical prediction of opponents' intentions, latent strategies, and trajectories, while establishing a dynamic co-adaptation loop between the world model and policy. Specifically, the high-level world model integrates observations relative to opponents and learns learnable multi-intention queries to predict future intentions and trajectories. It then incorporates these inferred intentions into the low-level world model, which uses them to predict how the opponents' learnable latent strategies react and influence the trajectories of cooperative agents. Our method achieves faster convergence and higher rewards than state-of-the-art model-free/based reinforcement learning and opponent modeling approaches in multi-UAV games, demonstrating strong scalability up to 10 vs. 10 with improved win and survival rates. Cumulative error analysis and t-SNE visualizations verify effective reasoning about opponents' multiple intentions and latent strategies. It also outperforms existing methods on StarCraft Multi-Agent Challenge and Google Research Football benchmarks across the majority of scenarios. The videos can be accessible in the supplemental materials.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 12092
Loading