Keywords: speech datasets
Abstract: Automatic Speech Recognition and Text-to-Speech systems are primarily trained in a supervised fashion and require high-quality, accurately labeled speech datasets.
In this work, we examine common problems with speech data and introduce a toolbox for the construction and interactive error analysis of speech datasets. The construction tool is based on K{\"u}rzinger et al. work, and, to the best of our knowledge, the dataset exploration tool is the world's first open-source tool of this kind. We demonstrate how to apply these tools to create a Russian speech dataset and analyze existing speech datasets (Multilingual LibriSpeech, Mozilla Common Voice). The tools are open sourced as a part of the NeMo framework.
URL: n/a
TL;DR: An introduction to a Toolbox for Construction and Analysis of Speech Datasets.
Supplementary Material: zip
Contribution Process Agreement: Yes
Author Statement: Yes
10 Replies
Loading