Language Model Adaptation for Lecture Transcription by Document Retrieval

Adria A. Martinez-Villaronga, Miguel A. del Agua, Joan Albert Silvestre-Cerdà, Jesús Andrés-Ferrer, Alfons Juan

2014 (modified: 18 Sept 2021)IberSPEECH 2014Readers: Everyone

Abstract: With the spread of MOOCs and video lecture repositories it is more important than ever to have accurate methods for automatically transcribing video lectures. In this work, we propose a simple yet effective language model adaptation technique based on document retrieval from the web. This technique is combined with slide adaptation, and compared against a strong baseline language model and a stronger slide-adapted baseline. These adaptation techniques are compared within two different acoustic models: a standard HMM model and the CD-DNN-HMM model. The proposed method obtains improvements on WER of up to 14% relative with respect to a competitive baseline as well as outperforming slide adaptation.

0 Replies