Abstract: Advancements in Multimodal Large Language Models (MLLMs) have improved human motion understanding. However, these models remain constrained by their "instruct-only" nature, lacking adaptability for diverse analytical perspectives. To address these challenges, we introduce ChatMotion, a multimodal multi-agent framework for human motion analysis. ChatMotion dynamically interprets user intent, decomposes complex tasks into meta-tasks, and activates specialized function modules for motion comprehension. It integrates specialized toolset, MotionCore, to analyze human motion from various perspectives. Extensive experiments demonstrate ChatMotion's precision and adaptability for human motion understanding.
Paper Type: Long
Research Area: Human-Centered NLP
Research Area Keywords: multimodal, large language models, human motion analysis, interactive systems
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Keywords: multimodal, large language models, human motion analysis, interactive systems
Submission Number: 1460
Loading