Keywords: Motion Generation, Information Density, Information Distribution Mining
TL;DR: This paper introduces a simple yet effective framework that leverages information density to dynamically balance text and motion guidance, establishing a competitive baseline for text-to-motion generation.
Abstract: Text-to-motion generation has advanced significantly with diffusion models. However, existing approaches typically assume that text and motion guidance contribute equally across time. This oversimplification often leads to either semantically faithful but unnatural motions or smooth trajectories that drift from the intended meaning. To address these limitations, we introduce the Information-Balanced Motion Generator (IBMG), a simple yet effective framework that dynamically balances text and motion guidance through the perspective of information density. Specifically, we define text information density by measuring the semantic alignment between motion segments and textual tokens, while motion information density is captured via temporal variation across segments. These distributions are encoded into an information-balance embedding that adaptively modulates the relative influence of text and motion during generation, thereby balancing semantic fidelity with motion naturalness. Extensive experiments across two benchmarks and three backbones demonstrate that IBMG consistently improves generation quality. On HumanML3D, it reduces FID by 60.5\% and lowers trajectory error by 4.3\% compared to baselines. These results highlight information density as a key principle for harmonizing semantic fidelity with temporal coherence, establishing IBMG as a competitive baseline for text-to-motion generation.
Primary Area: generative models
Submission Number: 13293
Loading