Grounded Language Modeling for Automatic Speech Recognition of Sports Video

Michael Fleischman, Deb Roy

2008 (modified: 13 Nov 2022)ACL 2008Readers: Everyone

Abstract: Grounded language models represent the relationship between words and the non-linguistic context in which they are said. This paper describes how they are learned from large corpora of unlabeled video, and are applied to the task of automatic speech recognition of sports video. Results show that grounded language models improve perplexity and word error rate over text based language models, and further, support video information retrieval better than human generated speech transcriptions.

0 Replies