A Self-supervised Neural Topic Model Extended with Adversarial Data Augmentation

A Self-supervised Neural Topic Model Extended with Adversarial Data Augmentation

ACL ARR 2025 May Submission6616 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Neural topic models (NTMs) have become an increasingly important component of topic modeling due to their flexibility and extensibility, which have facilitated various advancements, including the incorporation of self-supervised learning. Self-supervised NTMs construct contrastive samples either in the document representation space or the topic representation space, aiming to optimize the relationship between anchor and contrastive samples. However, previous approaches often rely on tf-idf-based augmentation strategies, which produce contrastive samples with limited informativeness, constraining their effectiveness in enhancing topic quality. To address this limitation, we propose an extension of the predecessor model into an adversarial framework, where positive samples are dynamically generated in the embedding space by a trainable augmentation model. Our approach further integrates contextualized word embeddings extracted from large language models (LLMs), enhancing the semantic richness of the generated samples. Extensive experiments demonstrate that our model consistently outperforms existing methods in terms of topic coherence, validating the effectiveness of adversarial learning for self-supervised NTMs.

Paper Type: Long

Research Area: Machine Learning for NLP

Research Area Keywords: topic modeling, self-supervised learning, data augmentation, adversarial training

Languages Studied: English

Submission Number: 6616

Loading