VERSA-v2: A Modular and Scalable Toolkit for Speech and Audio Evaluation with Expanded Metrics, Visualization, and LLM Integration
Keywords: Speech evaluation, audio evaluation, music evaluation, speech profiling, speech analysis, spoken signal analysis
TL;DR: A major upgrade of the Versatile Evaluation of Speech and Audio (VERSA) toolkit for standardized and scalable evaluation across speech, audio and music.
Presentation Preference: Open to it if recommended by organizers
Abstract: We present VERSA-v2, a major upgrade of the Versatile Evaluation of Speech and Audio (VERSA) toolkit for standardized and scalable evaluation across speech, audio, and music tasks. It features a modular, object-oriented architecture that simplifies metric integration and now supports over 100 metrics, organized into curated task-specific packs. VERSA-v2 also introduces interactive visualizations, per-metric profiling, and prompt-based evaluation using both text- and audio-based large language models (LLMs). These advancements make VERSA-v2 a robust, extensible, and LLM-enabled platform for comprehensive and interpretable speech and audio evaluation.
Submission Number: 18
Loading