Towards Improving Topic Models with the BERT-based Neural Topic Encoder

Anonymous

Towards Improving Topic Models with the BERT-based Neural Topic Encoder

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone

Abstract: Neural Topic Models (NTMs) have been popular for mining a set of topics from a collection of corpora. Recently, there is an emerging direction of combining NTMs with pre-trained language models such as BERT, which aims to use the contextual information to of BERT to help train better NTMs.However, existing works in this direction either use the contextual information of pre-trained language models as the input of NTMs or align the outputs of the two kinds of models.In this paper, we study how to build deeper interactions between NTMs and pre-trained language and propose a BERT-based neural topic encoder, which deeply integrates with the transformer layers of BERT. Our proposed encoder encodes both the BoW data and the sequence of words of a document, which can be complementary to each other for learning a better topic distribution for the document.The proposed encoder is a better alternative to the ones used in existing NTMs.Thanks to the in-depth integration with BERT, extensive experiments show that the proposed model achieves the state-of-art performances the comparisons with many advanced models.

0 Replies

Loading