Abstract: Unsupervised contrastive learning for high-quality sentence representations has gained widespread attention in recent years. However, existing dropout-based data augmentation method, such as Unsup-SimCSE [13], may suffer from the limitation of minimal semantic changes, which can result in the potential exclusion of positive samples and thus hinder alignment. To alleviate this problem, we propose a novel approach called Soft Positive Contrastive Sentence Embeddings (SPCSE), which leverages soft positives generated from diverse discrete data augmentation methods. By incorporating soft positives, SPCSE aims to enhance the alignment between positive samples and anchors in the representation space. Our experimental results across seven Semantic Textual Similarity (STS) tasks demonstrate that SPCSE can significantly improve the alignment of positive samples and achieve overall performance enhancement compared to Unsup-SimCSE.
Loading