A Self-supervised Neural Topic Model Extended with Adversarial Data Augmentation

A Self-supervised Neural Topic Model Extended with Adversarial Data Augmentation

ACL ARR 2025 February Submission5370 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Neural topic models (NTMs) have advanced topic modeling through their flexibility, enabling self-supervised learning with contrastive samples at the document or topic representation level. However, prior tf-idf-based augmentation strategies provide limited guidance during training. To address this, we propose an adversarial framework with a trainable augmentation model that generates positive samples in the embedding space, leveraging contextualized word embeddings from large language models (LLMs). Experimental results demonstrate that our model surpasses previous approaches in topic coherence, highlighting the effectiveness of adversarial data augmentation in improving topic modeling performance.

Paper Type: Short

Research Area: Machine Learning for NLP

Research Area Keywords: topic modeling, self-supervised learning, data augmentation, adversarial training

Languages Studied: English

Submission Number: 5370

Loading