JointDiff: Bridging Continuous and Discrete in Multi-Agent Trajectory Generation

ICLR 2026 Conference Submission9796 Authors

Published: 26 Jan 2026, Last Modified: 26 Jan 2026ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-Agent, Diffusion, Controllable, Trajectory
Abstract: Generative models often treat continuous data and discrete events as separate processes, creating a gap in modeling complex systems where they interact synchronously. To bridge this gap, we introduce $\textbf{JointDiff}$, a novel diffusion framework designed to unify these two processes by simultaneously generating continuous spatio-temporal data and synchronous discrete events. We demonstrate its efficacy in the sports domain by simultaneously modeling multi-agent trajectories and key possession events. This joint modeling is validated with non-controllable generation and two novel controllable generation scenarios: $\textit{weak-possessor-guidance}$, which offers flexible semantic control over game dynamics through a simple list of intended ball possessors, and $\textit{text-guidance}$, which enables fine-grained, language-driven generation. To enable the conditioning with these guidance signals, we introduce $\textbf{CrossGuid}$, an effective conditioning operation for multi-agent domains. We also share a new unified sports benchmark enhanced with textual descriptions for soccer and football datasets. JointDiff achieves state-of-the-art performance, demonstrating that joint modeling is crucial for building realistic and controllable generative models for interactive systems.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 9796
Loading