Word Embedding-Based Topic Similarity Measures

Silvia Terragni, Elisabetta Fersini, Enza Messina

2021 (modified: 14 Oct 2021)NLDB 2021Readers: Everyone

Abstract: Topic models aim at discovering a set of hidden themes in a text corpus. A user might be interested in identifying the most similar topics of a given theme of interest. To accomplish this task, several similarity and distance metrics can be adopted. In this paper, we provide a comparison of the state-of-the-art topic similarity measures and propose novel metrics based on word embeddings. The proposed measures can overcome some limitations of the existing approaches, highlighting good capabilities in terms of several topic performance measures on benchmark datasets.

0 Replies