CODA: Commonsense-Driven Autoregressive Human Interaction Generation

Chengjian Li; Xiangbo Shu; Haifeng Xia; Meiqi Cao; Guo-Sen Xie; Yazhou Yao; Jinhui Tang

CODA: Commonsense-Driven Autoregressive Human Interaction Generation

Chengjian Li, Xiangbo Shu, Haifeng Xia, Meiqi Cao, Guo-Sen Xie, Yazhou Yao, Jinhui Tang

04 Sept 2025 (modified: 13 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Human interaction generation, Interactive Codebook Storager, Commonsense Constraint Loss

Abstract: Human interaction generation (HIG) aims to synthesize commonsense-plausible interaction motion from textual descriptions. However, most existing generation methods as diffusion and autoregressive models typically overlook explicit commonsense constraints, leading to implausible motion artifacts such as bone stretching or penetration. To address these issues, this work proposes a novel learning paradigm CODA with two core components: Interactive Codebook Storager (ICS) and Commonsense Constraint Loss (CCL). Specifically, ICS captures and stores commonsense features of single-person motion and human-human interaction, ensuring high-quality motion generation. Based on this, CCL constrains single-person joint trajectories, regulates the center-of-mass position, and applies distance and collision constraints in multi-person interactions, effectively suppressing motion artifacts and explicitly enforcing commonsense plausibility. Extensive experimental results suggest that our CODA generates higher-quality HIG scenarios than existing state-of-the-art methods.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 1955

Loading