BIGOS V2 Benchmark for Polish ASR: Curated Datasets and Tools for Reproducible Evaluation

Michał Junczyk

BIGOS V2 Benchmark for Polish ASR: Curated Datasets and Tools for Reproducible Evaluation

Michał Junczyk

Published: 26 Sept 2024, Last Modified: 12 Jan 2025NeurIPS 2024 Track Datasets and Benchmarks PosterEveryoneRevisionsBibTeXCC BY-SA 4.0

Keywords: speech dataset, automatic speech recognition, evaluation, benchmark, bigos, polish, asr

TL;DR: Comprehensive benchmarking system for Polish ASR, curating diverse datasets and evaluation tools to improve reproducibility, transparency, and performance assessment across commercial and open-source models.

Abstract: Speech datasets available in the public domain are often underutilized because of challenges in accessibility and interoperability. To address this, a system to survey, catalog, and curate existing speech datasets was developed, enabling reproducible evaluation of automatic speech recognition (ASR) systems. The system was applied to curate over 24 datasets and evaluate 25 ASR models, with a specific focus on Polish. This research represents the most extensive comparison to date of commercial and free ASR systems for the Polish language, drawing insights from 600 system-model-test set evaluations across 8 analysis scenarios. Curated datasets and benchmark results are available publicly. The evaluation tools are open-sourced to support reproducibility of the benchmark, encourage community-driven improvements, and facilitate adaptation for other languages.

Submission Number: 1289

Loading