Multi-Stage Contrastive Learning with Joint Domain-Specific Masked Supervision for Domain Adaptation of Sentence Embedding Models
Abstract: We present a multi-stage contrastive learning framework for domain adaptation of sentence embedding models, incorporating joint domain-specific masked supervision. Our approach addresses the challenges of adapting large-scale general-domain sentence embedding models to specialized domains. By jointly optimizing masked language modeling (MLM) and contrastive objectives within a unified training pipeline, our method enables effective learning of domain-relevant representations while preserving the robust semantic discrimination properties of the original model. We empirically validate our approach on both high-resource and low-resource domains, achieving improvements up to 13.4\% in NDCG@10 over strong general-domain baselines. Comprehensive ablation studies further demonstrate the effectiveness of each component, highlighting the importance of balanced joint supervision and staged adaptation.
Paper Type: Long
Research Area: Machine Learning for NLP
Research Area Keywords: sentence embeddings, contrastive learning, self-supervised learning, transfer learning, domain adaptation, representation learning, low-resource, pre-training, masked language modeling
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Publicly available software and/or pre-trained models
Languages Studied: english
Submission Number: 726
Loading