ChatMotion: A Multimodal Multi-Agent for Human Motion Analysis

ACL ARR 2025 May Submission1460 Authors

17 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Advancements in Multimodal Large Language Models (MLLMs) have improved human motion understanding. However, these models remain constrained by their "instruct-only" nature, lacking adaptability for diverse analytical perspectives. To address these challenges, we introduce ChatMotion, a multimodal multi-agent framework for human motion analysis. ChatMotion dynamically interprets user intent, decomposes complex tasks into meta-tasks, and activates specialized function modules for motion comprehension. It integrates specialized toolset, MotionCore, to analyze human motion from various perspectives. Extensive experiments demonstrate ChatMotion's precision and adaptability for human motion understanding.
Paper Type: Long
Research Area: Human-Centered NLP
Research Area Keywords: multimodal, large language models, human motion analysis, interactive systems
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Keywords: multimodal, large language models, human motion analysis, interactive systems
Submission Number: 1460
Loading