Keywords: Human interaction generation, Interactive Codebook Storager, Commonsense Constraint Loss
Abstract: Human interaction generation (HIG) aims to synthesize commonsense-plausible interaction motion from textual descriptions. However, most existing generation methods as diffusion and autoregressive models typically overlook explicit commonsense constraints, leading to implausible motion artifacts such as bone stretching or penetration. To address these issues, this work proposes a novel learning paradigm CODA with two core components: Interactive Codebook Storager (ICS) and Commonsense Constraint Loss (CCL). Specifically, ICS captures and stores commonsense features of single-person motion and human-human interaction, ensuring high-quality motion generation. Based on this, CCL constrains single-person joint trajectories, regulates the center-of-mass position, and applies distance and collision constraints in multi-person interactions, effectively suppressing motion artifacts and explicitly enforcing commonsense plausibility. Extensive experimental results suggest that our CODA generates higher-quality HIG scenarios than existing state-of-the-art methods.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 1955
Loading