Abstract: Speech datasets available for training Romanian automatic speech recognition (ASR) systems are constructed around similar demographics (male voices, age between 19-29 years). In this paper, we present a dataset of underrepresented Romanian speech (USPDATRO), constructed from open data. We fine-tune a state-of-the-art Whisper model using existing datasets and evaluate the resulting model on the dataset of underrepresented speech. Results indicate that more such data is needed to improve the performance of Romanian ASR sytems.
0 Replies
Loading