SubTST: A Combination of Sub-word Latent Topics and Sentence Transformer for Semantic Similarity Detection

Published: 01 Jan 2022, Last Modified: 20 May 2025ICAART (3) 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Topic information has been useful for semantic similarity detection. In this paper, we present a study on a novel and efficient method to incorporate the topic information with Transformer-based models, which is called the Sub-word Latent Topic and Sentence Transformer (SubTST). The proposed model basically inherits the advantages of the SBERT (Reimers and Gurevych, 2019) architecture, and learns latent topics in the sub- word level instead of the document or word levels as previous work. The experimental results illustrate the effectiveness of our proposed method that significantly outperforms the SBERT, and the tBERT (Peinelt et al., 2020), two state-of-the-art methods for semantic textual detection, on most of the benchmark datasets.
Loading