DualMind: A Fast-Slow Thinking Agent for Meeting Assistance with Agent-Wake-Up Dataset and Comprehensive Benchmark

DualMind: A Fast-Slow Thinking Agent for Meeting Assistance with Agent-Wake-Up Dataset and Comprehensive Benchmark

ACL ARR 2025 February Submission6838 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: In an era where hybrid and multilingual meetings have become the norm, AI meeting assistants must efficiently handle both high-volume routine queries and intricate, context-rich tasks. Our analysis of existing meeting AI assistants reveals several critical limitations: (1) unstable content relevance affecting response accuracy, (2) uniform processing of both simple and complex queries leading to suboptimal response times, and (3) insufficient multimodal support for diverse meeting scenarios. These limitations significantly impact meeting experiences and efficiency. In this paper, we introduce DualMind, a dual-process meeting assistance system designed to strike an optimal balance between rapid reaction and careful reasoning. Our work makes three key contributions: (1) AISHELL-Agent, a multimodal conferencing dataset that captures a comprehensive spectrum of meeting interactions and query complexities;(2) AMBER (Agent Meeting BEnchmark fRamework), a multi-criteria evaluation suite for measuring meeting assistant performance; and (3) DualMind's dual-agent architecture featuring Talker for fast-thinking responses and Planner for complex reasoning tasks. Comprehensive evaluation on AISHELL-Agent through AMBER demonstrates DualMind's superiority, achieving 1500ms faster responses for routine queries and 22.5\% better complex task outcomes than single-model baselines. The dataset enables robust cross-scenario validation while AMBER provides multidimensional performance insights, establishing DualMind as an effective solution balancing speed and reasoning depth. Our work pioneers a cognitive-inspired paradigm for AI assistants, emphasizing the synergy of specialized datasets, nuanced evaluation frameworks, and psychology-inspired architectures.

Paper Type: Long

Research Area: Dialogue and Interactive Systems

Research Area Keywords: Dialogue and Interactive Systems ,Resources and Evaluation,Generation,Speech Recognition, Text-to-Speech and Spoken Language Understanding,Multimodality and Language Grounding

Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources, Data analysis

Languages Studied: mandarin,english

Submission Number: 6838

Loading