Is Automated Topic Model Evaluation Broken? The Incoherence of Coherence

Alexander Hoyle; Pranav Goel; Andrew Hian-Cheong; Denis Peskov; Jordan Lee Boyd-Graber; Philip Resnik

Is Automated Topic Model Evaluation Broken? The Incoherence of Coherence

Alexander Hoyle, Pranav Goel, Andrew Hian-Cheong, Denis Peskov, Jordan Lee Boyd-Graber, Philip Resnik

Published: 09 Nov 2021, Last Modified: 26 May 2025NeurIPS 2021 SpotlightReaders: Everyone

Keywords: topic model, topic model evaluation, npmi, topic coherence, human evaluation, metric, evaluation, validation, crowdsourcing, model comparison, automatic metric, automated metric

Abstract: Topic model evaluation, like evaluation of other unsupervised methods, can be contentious. However, the field has coalesced around automated estimates of topic coherence, which rely on the frequency of word co-occurrences in a reference corpus. Contemporary neural topic models surpass classical ones according to these metrics. At the same time, topic model evaluation suffers from a validation gap: automated coherence, developed for classical models, has not been validated using human experimentation for neural models. In addition, a meta-analysis of topic modeling literature reveals a substantial standardization gap in automated topic modeling benchmarks. To address the validation gap, we compare automated coherence with the two most widely accepted human judgment tasks: topic rating and word intrusion. To address the standardization gap, we systematically evaluate a dominant classical model and two state-of-the-art neural models on two commonly used datasets. Automated evaluations declare a winning model when corresponding human evaluations do not, calling into question the validity of fully automatic evaluations independent of human judgments.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

TL;DR: Yes, automated topic evaluation is broken.

Supplementary Material: pdf

Code: https://www.github.com/ahoho/topics

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 5 code implementations](https://www.catalyzex.com/paper/is-automated-topic-model-evaluation-broken/code)

13 Replies

Loading