ANT: Adaptive Neural Temporal-Aware Text-to-Motion Model

Wenshuo Chen, Kuimou Yu, Jia Haozhe, Kaishen Yuan, Zexu Huang, Bowen Tian, Songning Lai, Hongru Xiao, Erhang Zhang, Lei Wang, Yutao Yue

Published: 27 Oct 2025, Last Modified: 29 Nov 2025CrossrefEveryoneRevisionsCC BY-SA 4.0
Abstract: While diffusion models advance text-to-motion generation, their static semantic conditioning ignores temporal-frequency demands: early denoising requires structural semantics for motion foundations while later stages need localized details for text alignment. This mismatch mirrors biological morphogenesis where developmental phases demand distinct genetic programs. Inspired by epigenetic regulation governing morphological specialization, we propose (ANT), an Adaptive Neural Temporal-Aware architecture. ANT orchestrates semantic granularity through: (i) Semantic Temporally Adaptive (STA) Module: Automatically partitions denoising into low-frequency structural planning and high-frequency refinement via spectral analysis. (ii) Dynamic Classifier-Free Guidance scheduling (DCFG): Adaptively adjusts conditional to unconditional ratio enhancing efficiency while maintaining fidelity. Extensive experiments show that ANT can be applied to various baselines, significantly improving model performance, and achieving state-of-the-art semantic alignment on StableMoFusion. Code can be found on https://github.com/CCSCovenant/ANT.
Loading