Abstract: Neural topic models (NTMs) have become an increasingly important component of topic modeling due to their flexibility and extensibility, which have facilitated various advancements, including the incorporation of self-supervised learning. Self-supervised NTMs construct contrastive samples either in the document representation space or the topic representation space, aiming to optimize the relationship between anchor and contrastive samples. However, previous approaches often rely on tf-idf-based augmentation strategies, which produce contrastive samples with limited informativeness, constraining their effectiveness in enhancing topic quality. To address this limitation, we propose an extension of the predecessor model into an adversarial framework, where positive samples are dynamically generated in the embedding space by a trainable augmentation model. Our approach further integrates contextualized word embeddings extracted from large language models (LLMs), enhancing the semantic richness of the generated samples. Extensive experiments demonstrate that our model consistently outperforms existing methods in terms of topic coherence, validating the effectiveness of adversarial learning for self-supervised NTMs.
Paper Type: Long
Research Area: Machine Learning for NLP
Research Area Keywords: topic modeling, self-supervised learning, data augmentation, adversarial training
Languages Studied: English
Submission Number: 6616
Loading