Lifelong Hierarchical Topic Modeling via Nonparametric Word Embedding Clustering

Published: 01 Jan 2024, Last Modified: 10 Jan 2025ECML/PKDD (8) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Hierarchical topic models that can mine topics representing latent semantics and organize these topics into hierarchies have been widely developed. However, the existing methods often assume a fixed topic hierarchy, leading to poor performance when applied to document streams. Meanwhile, the prior knowledge of topic structure is helpful for hierarchical topic modeling but it is quite costly to obtain such information manually. To address these issues, we propose a lifelong hierarchical topic model to automatically learn flexible topic structure by nonparametric word embedding clustering. Besides, we design a knowledge base in the form of word hierarchies that serves as automatically-extracted prior knowledge to support the topic structure generation. Furthermore, we update the knowledge base by accumulating structure information from the past. Experiments on real-world datasets demonstrate that our method can generate a rational, flexible, and coherent topic structure. Lifelong learning evaluations also validate that our method is less influenced by catastrophic forgetting than baseline models. Our code is available at https://github.com/yjx5050ptol/LNCHTM.
Loading