SyMoFlow: Interaction-Aware Motion Synthesis from Text via Symmetric Flows

ICLR 2026 Conference Submission17360 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Motion Synthesis, Human-human Interaction Generation, Discrete Flow Matching, Text-driven Generation
TL;DR: SyMoFlow is a text-driven framework that generates two-person interactions by sequentially synthesizing each agent's motion using symmetric flows.
Abstract: Human-Human Interaction (HHI) generation aims to synthesize plausible and coordinated motion sequences for multiple agents in a shared environment. Existing approaches often struggle to capture reciprocal dependencies, maintain semantic alignment with textual descriptions, or balance realism and diversity. To address these challenges, we propose SyMoFlow, a text-driven motion synthesis framework that leverages an interaction-symmetric decomposition of the joint motion distribution. SyMoFlow generates sequential single-agent motions: it first produces an interaction-aware motion for one agent conditioned on text, then synthesizes the second agent’s motion conditioned on the first, capturing both prior action and reciprocal reaction. By explicitly modeling interdependent dynamics, our approach produces coordinated, causally consistent behaviors while allowing flexible flow-based sampling to enhance multimodality and diversity. Extensive experiments on the InterHuman and InterX benchmarks demonstrate that SyMoFlow achieves state-of-the-art realism and text alignment while significantly improving the diversity of plausible interactions.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 17360
Loading