A Contrastive Learning Approach to Paraphrase Identification

Published: 2025, Last Modified: 25 Jan 2026CASE 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This paper focuses on the paraphrase identification (PI) task, a fundamental NLP task, which aims to determine whether a pair of sentences convey the same or similar meanings. Despite the significant progresses of current pre-trained language models in PI task, the inherent ambiguity of natural languages stemming from the polysemous nature of words presents a challenge in assessing semantic similarity. Therefore, there is a necessity for further enhancement in capturing intricate relationships between sentences. In light of this challenge, we propose a method that utilizes contrastive learning to enhance sentence embeddings that are optimized for discriminating between sentences with similar or dissimilar semantic meanings. To be specific, the novel framework involves training a BERT model on modified Natural Language Inference (NLI) datasets using two-level contrastive learning to obtain a 2-Level-CLPI-BERT model, aiming to enhance sentence representations for the PI task. Experiments conducted on four PI datasets demonstrate that the proposed model outperforms state-of-the-art methods in intra-dataset. Furthermore, the cross-dataset performance evaluation substantiates the generalizability of 2-Level-CLPI-BERT embeddings.
Loading