Same Meeting, Different Minutes: A Controlled Cross-Platform Evaluation of Commercial AI Meeting Summarizers

Same Meeting, Different Minutes: A Controlled Cross-Platform Evaluation of Commercial AI Meeting Summarizers

ACL ARR 2026 May Submission16818 Authors

26 May 2026 (modified: 02 Jun 2026)ACL ARR 2026 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: summary

Abstract: Increasingly, workplaces are using intelligent agents to take notes and provide summaries during in-person and virtual meetings. Despite widespread adoption of AI, meeting summarization across commercial platforms, systematic quality evaluation remains limited. This study presents the first cross-platform analysis of meeting summarization performance, comparing native-platform summarizers (Google Meet, Zoom, and Microsoft Teams), agent-based summarizers (Otter.ai and Fireflies.ai), and general-purpose LLMS (ChatGPT, Llama, and Claude), all trained on identical meeting content. Our novel methodology uses automated methods (Rouge, BLEU, METEOR, BERTScore, and LSA) to score multiple runs of generated summaries. This standardized evaluation framework lays the groundwork for future investigations of summary quality under varying conditions, such as changes in signal-to-noise ratio and speaker volume. Our work establishes the first benchmark for commercial meeting summarization tools, revealing significant performance disparities and providing actionable insights for both platform users and researchers.

Paper Type: Long

Research Area: Summarization

Research Area Keywords: conversational summarization, abstractive summarisation, evaluation, factuality, benchmarking, evaluation methodologies, metrics, reproducibility, statistical testing for evaluation, human evaluation, automatic evaluation

Contribution Types: Data analysis

Languages Studied: English

EMNLP 2026 AI Reviewing Experiment: yes

Submission Number: 16818

Loading