COMBAT: Conditional World Models for Behavioral Agent Training

Anmol Agarwal; Pranay Meshram; Sumer Singh; Saurav Suman; Andrew Lapp; Shahbuland Matiana; Louis Castricato; Spencer Frazier

COMBAT: Conditional World Models for Behavioral Agent Training

Anmol Agarwal, Pranay Meshram, Sumer Singh, Saurav Suman, Andrew Lapp, Shahbuland Matiana, Louis Castricato, Spencer Frazier

Published: 02 Oct 2025, Last Modified: 10 Oct 2025RIWM Non ArchivalEveryoneRevisionsBibTeXCC BY 4.0

Keywords: World Models in Gaming, Diffusion Models, Auto-regressive World models, Video generation, neural game engines, Controllable video generation, variational autoencoders, diffusion transformer, agents

TL;DR: This paper presents COMBAT, a conditional world model that learns a reactive opponent's tactical policy as an emergent property by training a real-time diffusion model on a fighting game using only a single player's actions for supervision.

Abstract: Recent advances in video generation have spurred the development of world models capable of simulating 3D-consistent environments and interactions with static objects. However, a significant limitation remains in their ability to model dynamic, reactive agents that can intelligently influence and interact with the world. To address this gap, we introduce COMBAT, a real-time, action-controlled world model trained on the complex 1v1 fighting game Tekken 3. Our work demonstrates that diffusion models can successfully simulate a dynamic opponent that reacts to player actions, learning its behavior implicitly. Our approach utilizes a 1.2 billion parameter Diffusion Transformer, conditioned on latent representations from a deep compression autoencoder. We employ state-of-the-art techniques, including causal distillation and diffusion forcing, to achieve real-time inference. Crucially, we observe the emergence of sophisticated agent behavior by training the model solely on single-player inputs, without any explicit supervision for the opponent's policy. Unlike traditional imitation learning methods, which require complete action labels, COMBAT learns effectively from partially observed data to generate responsive behaviors for a controllable Player 1. We present an extensive study and introduce novel evaluation methods to benchmark this emergent agent behavior, establishing a strong foundation for training interactive agents within diffusion-based world models.

Supplementary Material: zip

Submission Number: 10

Loading