Abstract: In recent years there have been great interests in addressing the low resourcefulness of African languages and provide baseline models for different Natural Language Processing tasks. Several initiatives on the continent use the Bible as a data source to provide proof of concept for some NLP tasks. In this work, we present the Lingala Speech Translation (LiSTra) dataset, release a full pipeline for the construction of such dataset in other languages, and report baselines using both the traditional cascade approach (Automatic Speech Recognition -> Machine Translation) and a revolutionary transformer-based End-2-End architecture with a custom interactive attention that allows information sharing between the recognition decoder and the translation decoder.
0 Replies
Loading