H2IL-MBOM: A Hierarchical World Model Integrating Intent and Latent Strategy for Opponent Modeling in Multi-UAV Game

H2IL-MBOM: A Hierarchical World Model Integrating Intent and Latent Strategy for Opponent Modeling in Multi-UAV Game

ICLR 2026 Conference Submission12092 Authors

18 Sept 2025 (modified: 21 Nov 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-UAV game, Multi-agent reinforcement learning, Opponent modeling, World model

Abstract: In mixed cooperative-competitive scenarios, the uncertain decisions made by agents on both sides not only render learning non-stationary but also pose significant threats to each other's security. Existing methods typically predict policy beliefs based on opponents' actions, goals, and rewards, or predict trajectories and intentions solely from local historical observations. However, the above private information is unavailable and they neglect the intrinsic dynamics between mental states, actions, and trajectories for both sides. Inspired by human cognitive processes, which hierarchically infer intentions from opponents' historical trajectories, reason about latent strategies based on teammates' historical responses, and simulate the co-evolution of mental states and trajectories, we propose a Hierarchical Interactive Intent-Latent-Strategy-Aware World Model-based Opponent Model (H2IL-MBOM) and the Mutual Self-Observed Adversary Reasoning PPO (MSOAR-PPO). These methods enable the recursive and hierarchical prediction of opponents' intentions, latent strategies, and trajectories, while establishing a dynamic co-adaptation loop between the world model and policy. Specifically, the high-level world model integrates observations relative to opponents and learns learnable multi-intention queries to predict future intentions and trajectories. It then incorporates these inferred intentions into the low-level world model, which uses them to predict how the opponents' learnable latent strategies react and influence the trajectories of cooperative agents. Our method achieves faster convergence and higher rewards than state-of-the-art model-free/based reinforcement learning and opponent modeling approaches in multi-UAV games, demonstrating strong scalability up to 10 vs. 10 with improved win and survival rates. Cumulative error analysis and t-SNE visualizations verify effective reasoning about opponents' multiple intentions and latent strategies. It also outperforms existing methods on StarCraft Multi-Agent Challenge and Google Research Football benchmarks across the majority of scenarios. The videos can be accessible in the supplemental materials.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 12092

Loading