Self-Supervised Learning of Contextualized Neural Topic Models with VIC Regularization

ACL ARR 2025 February Submission8025 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Topic modeling analyzes large document collections to uncover underlying latent topics. This approach has applications in document retrieval, classification, and beyond. Recently, neural topic models, which leverage neural networks for topic extraction, have gained attention, particularly with the integration of contextual embeddings from sentence embedding. Self-supervised learning, which uses pseudo-labels derived from the data itself, has shown promise in this domain. Variance-Invariance-Covariance (VIC) Regularization, originally introduced for multimodal analysis, has been shown to be effective for neural topic models using only word-based embeddings; however, its applicability to neural topic models incorporating contextual embeddings remains unexplored. This study proposes a self-supervised neural topic model incorporating VIC Regularization and contextual embeddings. Our experimental results indicate improved topic coherence compared to conventional neural topic models.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: topic modeling, self-supervised learning, contextual embedding
Languages Studied: English
Submission Number: 8025
Loading