NCSE: Neighbor Contrastive Learning for Unsupervised Sentence Embeddings

Published: 01 Jan 2024, Last Modified: 15 May 2025IJCNN 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Unsupervised sentence embedding methods based on contrastive learning have gained attention for effectively representing sentences in natural language processing. Retrieving additional samples via a nearest-neighbor approach can enhance the model’s ability to learn relevant semantics and distinguish sentences. However, previous related research mainly focused on retrieving neighboring samples within a single batch range or global range, which makes the model possibly unable to capture effective semantic information or incurs excessive time cost. Furthermore, previous methods use retrieved neighbor samples as hard negatives. We argue that nearest neighbor samples contain relevant semantic information, and treating them as hard negatives risks losing valuable semantic knowledge. In this work, we introduce Neighbor Contrastive learning for unsupervised Sentence Embeddings(NCSE), which combines contrastive learning with the nearest-neighbor approach. Specifically, we create a candidate set to store sentence embeddings across multiple batches. Retrieving the candidate set can ensure sufficient samples, making it easier for the model to learn relevant semantics. Using retrieved nearest neighbor samples as positives and applying the self-attention mechanism to aggregate the sample and its neighbors encourages the model to learn relevant semantics from multiple neighbors. Experiments on the semantic text similarity task demonstrate our method’s effectiveness in sentence embedding learning.
Loading