Revisiting Topic-Guided Language Models

Published: 02 Dec 2023, Last Modified: 02 Dec 2023Accepted by TMLREveryoneRevisionsBibTeX
Abstract: A recent line of work in natural language processing has aimed to combine language models and topic models. These \textit{topic-guided language models} augment neural language models with topic models, unsupervised learning methods that can discover document-level patterns of word use. This paper compares the effectiveness of these methods in a standardized setting. We study four topic-guided language models and two baselines, evaluating the held-out predictive performance of each model on four corpora. Surprisingly, we find that \textit{none of these methods outperform a standard LSTM language model baseline}, and most fail to learn good topics. Further, we train a probe of the neural language model that shows that the baseline's hidden states already encode topic information. We make public all code used for this study.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=BgIJ8V4jLh
Changes Since Last Submission: Previously desk rejected due to incorrect font -- font has been fixed.
Code: https://github.com/carolinazheng/revisiting-tglms
Assigned Action Editor: ~Tao_Qin1
Submission Number: 1244
Loading