Same Meeting, Different Minutes: A Controlled Cross-Platform Evaluation of Commercial AI Meeting Summarizers
Keywords: summary
Abstract: Increasingly, workplaces are using intelligent agents to take notes and provide summaries during in-person and virtual meetings. Despite widespread adoption of AI, meeting summarization across commercial platforms, systematic quality evaluation remains limited. This study presents the first cross-platform analysis of meeting summarization performance, comparing native-platform summarizers (Google Meet, Zoom, and Microsoft Teams), agent-based summarizers (Otter.ai and Fireflies.ai), and general-purpose LLMS (ChatGPT, Llama, and Claude), all trained on identical meeting content. Our novel methodology uses automated methods (Rouge, BLEU, METEOR, BERTScore, and LSA) to score multiple runs of generated summaries. This standardized evaluation framework lays the groundwork for future investigations of summary quality under varying conditions, such as changes in signal-to-noise ratio and speaker volume. Our work establishes the first benchmark for commercial meeting summarization tools, revealing significant performance disparities and providing actionable insights for both platform users and researchers.
Paper Type: Long
Research Area: Summarization
Research Area Keywords: conversational summarization, abstractive summarisation, evaluation, factuality, benchmarking, evaluation methodologies, metrics, reproducibility, statistical testing for evaluation, human evaluation, automatic evaluation
Contribution Types: Data analysis
Languages Studied: English
EMNLP 2026 AI Reviewing Experiment: yes
Submission Number: 16818
Loading