SpeechQC-Agent: A Natural Language Driven Multi-Agent System for Speech Dataset Quality

SpeechQC-Agent: A Natural Language Driven Multi-Agent System for Speech Dataset Quality

ACL ARR 2025 May Submission7787 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: We introduce \textbf{SpeechQC-Agent}, a natural language–driven, multi-agent framework for automated verification of large-scale, multilingual speech-text datasets. Our system leverages a central Large Language Model (LLM) to interpret user-specified verification prompts and orchestrate a set of specialized agents that perform audio, transcript, and metadata quality checks. Each prompt is translated into a structured, dependency-aware workflow graph, executed through a combination of dynamically generated and pre-defined tools. To support evaluation, we release \textbf{SpeechQC-Dataset}, a synthetic yet realistic benchmark covering 15.5 hours of Hindi dialogue across diverse speakers, domains, and error types. Experiments across two verification stages-QC1 (audio and metadata) and QC2 (transcript and content), show that ChatGPT-based agents outperform open-weight LLMs in planning accuracy and execution robustness. We further adapt recent agentic evaluation protocols to measure workflow fidelity via subsequence and subgraph metrics. Our framework enables scalable, reproducible, and instruction-driven speech dataset verification, laying the foundation for high-quality speech corpus creation in low-resource settings.

Paper Type: Long

Research Area: Speech Recognition, Text-to-Speech and Spoken Language Understanding

Research Area Keywords: Automatic speech recognition, low resource, agent, large language model

Contribution Types: Model analysis & interpretability, Approaches to low-resource settings

Languages Studied: Hindi

Submission Number: 7787

Loading