Improved Text-Driven Human Motion Generation via Out-of-Distribution Detection and Rectification

Published: 01 Jan 2024, Last Modified: 16 Apr 2025CVM (1) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Text-driven human motion generation is gaining momentum lately thanks to its great potential in shaping the new pathway of interactive computer graphics in the era of AI. Despite the enormous efforts made so far, existing methods still struggle to ensure fluidity and body coordination when generating motions, which seriously hinders its application in a wide spectrum of areas such as gaming, animation, and the emerging metaverse. One of the many causes is, that learning directly from motion data is prone to interference from noise within the data, resulting in reduced quality of the generated motions. In this study, we for the first time propose to promote text-to-motion generation via out-of-distribution detection in the embedding space. Leveraging the Z-score-based outlier detection algorithm, we apply masking to motion data within the motion encoder and replace target data with means, ensuring the consistency of data distribution. To verify the effectiveness of the proposed method, we have conducted extensive experiments on the widely used KIT-ML dataset. Experimental results indicate that compared to previous frameworks, our solution significantly improves the quality of text-driven human motion generation.
Loading