ASR Language Resources for Faroese

Carlos Daniel Hernández Mena; Annika Simonsen; Jon Gudnason

ASR Language Resources for Faroese

Carlos Daniel Hernández Mena, Annika Simonsen, Jon Gudnason

Published: 20 Mar 2023, Last Modified: 21 Apr 2023NoDaLiDa 2023Readers: Everyone

Keywords: Faroese, Acoustic Model, Faroese ASR, Language Model, Pronunciation Dictionary, Faroese Language Resources

TL;DR: Presenting a set of novel language resources in Faroese suitable for Speech Recognition

Abstract: The aim of this work is to present a set of novel language resources in Faroese suitable for the field of Automatic Speech Recognition including: an ASR corpus comprised of 109 hours of transcribed speech data, acoustic models in systems such as WAV2VEC2, NVIDIA-NeMo, Kaldi and PocketSphinx; a set of n-gram language models and a set of pronunciation dictionaries with two different variants of Faroese. We also show comparison results between the distinct acoustic models presented here. All the resources exposed in this document are publicly available under creative commons licences.

4 Replies

Loading