Towards better subtitles: A multilingual approach for punctuation restoration of speech transcripts

Nuno Miguel Guerreiro, Ricardo Rei, Fernando Batista

2021 (modified: 18 Nov 2021)Expert Syst. Appl. 2021Readers: Everyone

Abstract: Highlights • A state-of-the-art approach for multilingual punctuation prediction. • Knowledge about punctuation from pre-trained transformer-based encoder models. • Monolingual models tested both in human-edited and in automatic transcripts. • Single multilingual model predicts punctuation in multiple languages. • Integration within an existing multilingual video subtitling pipeline. Abstract This paper proposes a flexible approach for punctuation prediction that can be used to produce state-of-the-art results in a multilingual scenario. We have performed experiments using transcripts of TED Talks from the IWSLT 2017 and IWSLT 2011 evaluation campaigns. Our experiments show that the recognition errors of the ASR output degrade the performance of our models, in line with related literature. Our monolingual models perform consistently in Human-edited transcripts of German, Dutch, Portuguese and Romanian, suggesting that commas may be more difficult to predict than periods, using pre-trained contextual models. We have trained a single multilingual model that predicts punctuation in multiple languages that achieves results comparable with the ones achieved by monolingual models, revealing evidence of the potential of using a single multilingual model to solve the task for multiple languages. Then, we argue that usage of current punctuation systems in the literature are implicitly dependent on correct segmentation of ASR outputs for they rely on positional information to solve the punctuation task. This is too big of a requirement for use in a real life application. Through several experiments, we show that our method to train and test models is more robust to different segmentation. These contributions are of particular importance in our multilingual pipeline, since they avoid training a different model for each of the involved languages, and they guarantee that the model will be more robust to incorrect segmentation of the ASR outputs in comparison with other methods in the literature. To the best of our knowledge, we report the first experiments using a single multilingual model for punctuation restoration in multiple languages.

0 Replies