ChatMotion: A Multimodal Multi-Agent for Human Motion Analysis

ChatMotion: A Multimodal Multi-Agent for Human Motion Analysis

ACL ARR 2025 May Submission1460 Authors

17 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Advancements in Multimodal Large Language Models (MLLMs) have improved human motion understanding. However, these models remain constrained by their "instruct-only" nature, lacking adaptability for diverse analytical perspectives. To address these challenges, we introduce ChatMotion, a multimodal multi-agent framework for human motion analysis. ChatMotion dynamically interprets user intent, decomposes complex tasks into meta-tasks, and activates specialized function modules for motion comprehension. It integrates specialized toolset, MotionCore, to analyze human motion from various perspectives. Extensive experiments demonstrate ChatMotion's precision and adaptability for human motion understanding.

Paper Type: Long

Research Area: Human-Centered NLP

Research Area Keywords: multimodal, large language models, human motion analysis, interactive systems

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English

Keywords: multimodal, large language models, human motion analysis, interactive systems

Submission Number: 1460

Loading