Harnessing Dimension-level Contrastive Learning and Information Compensation Mechanism for Sentence Embedding Enhancement
Abstract: Although unsupervised sentence embedding learning has achieved great success through the construction of positive samples and instance-level contrastive learning (ICL), the learned sentence embeddings can be over-compressed or suffer from dimensional pollution due to noisy data augmentation and unconstrained ICL learning processes. To address the above issues, we design a novel sentence embedding enhancement method, namely MSSE, where an information compensation mechanism (ICM) and a dimensional-level contrastive learning mechanism (DCM) are proposed. ICM is motivated by the information bottleneck principle and can prevent excessive compression of representation learning. DCM constrains the learning process of ICL and reduces information contamination across different dimensions. Experimental results demonstrate that our method outperforms the current competitive baselines for 7 STS tasks across unsupervised, few-shot, and supervised learning of sentence embeddings.
Paper Type: Long
Research Area: Semantics: Lexical and Sentence-Level
Research Area Keywords: semantic textual similarity; phrase/sentence embedding;
Languages Studied: English
Submission Number: 1169
Loading