Hybrid approach for text similarity detection in Vietnamese based on Sentence-BERT and WordNetOpen Website

Published: 2022, Last Modified: 05 Nov 2023ITCC 2022Readers: Everyone
Abstract: In this paper, we explore the task of similarity detection, which determines whether two sentences have the same meaning. Although the task has shown to be important in many natural language processing applications, not much work has been done in Vietnamese. We present an approach based on Sentence-BERT (SBERT) model. Leveraging the pre-trained model and combining it with linguistic knowledge (WordNet), we then tested it on two popular Vietnamese datasets: vnPara and VNPC. Our best model achieves 97.62% F1 score on vnPara and 95.31% F1 score on VNPC.
0 Replies

Loading