Revisiting Topic-Guided Language Models

Carolina Zheng; Keyon Vafa; David Blei

Revisiting Topic-Guided Language Models

Carolina Zheng, Keyon Vafa, David Blei

Published: 02 Dec 2023, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: A recent line of work in natural language processing has aimed to combine language models and topic models. These \textit{topic-guided language models} augment neural language models with topic models, unsupervised learning methods that can discover document-level patterns of word use. This paper compares the effectiveness of these methods in a standardized setting. We study four topic-guided language models and two baselines, evaluating the held-out predictive performance of each model on four corpora. Surprisingly, we find that \textit{none of these methods outperform a standard LSTM language model baseline}, and most fail to learn good topics. Further, we train a probe of the neural language model that shows that the baseline's hidden states already encode topic information. We make public all code used for this study.

Submission Length: Regular submission (no more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=BgIJ8V4jLh

Changes Since Last Submission: Previously desk rejected due to incorrect font -- font has been fixed.

Code: https://github.com/carolinazheng/revisiting-tglms

Assigned Action Editor: ~Tao_Qin1

License: Creative Commons Attribution 4.0 International (CC BY 4.0)

Submission Number: 1244

Loading