Exceptions in language as learned by the multi-factor sparse plus low-rank language model

Brian Hutchinson, Mari Ostendorf, Maryam Fazel

2013 (modified: 08 Nov 2022)ICASSP 2013Readers: Everyone

Abstract: Word usage is influenced by diverse factors, including topic, genre and various speaker/author characteristics. To characterize these aspects of language, we introduce the “Multi-Factor Sparse Plus Low Rank” exponential language model, which allows supervised joint training of arbitrary overlapping factor-specific model components. This flexible architecture has the advantage of being highly interpretable. The elements of sparse parameter matrices can be viewed as factor-dependent corrections (e.g. topic- or speaker-dependent phenomena). In topic modeling experiments on conversational telephone speech, we obtain modest perplexity reductions over an n-gram baseline and demonstrate topic-dependent keyword extraction that leads to a 13% (absolute) improvement in precision over TFIDF. We also show how keywords can be jointly learned for speakers, roles and topics in a study of Supreme Court oral arguments.

0 Replies