MaCSE: Multi-Agent Ranking Distillation for Contrastive Learning of Sentence Embeddings

Published: 2025, Last Modified: 22 Jan 2026IJCNN 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Sentence embedding models are typically trained using the Contrastive Learning (CL) method, which works by pulling similar semantics closer and pushing dissimilar ones away. Recent studies have shown that utilizing a multi-teacher ranking distillation approach, which assigns fine-grained rankings to sentences, enables the generation of smoother sentence similarity representations and results in higher-quality sentence embeddings. However, the effectiveness of distillation may be limited by the capacity of the student model. A simple student model with fewer parameters may struggle to approximate a highly complex teacher model, potentially leading to overfitting on certain datasets or specific aspects of the task. To address this, we propose MaCSE, a multi-agent ranking distillation framework that dynamically selects and optimizes teacher model contributions across training stages. MaCSE employs a Centralized Training with Decentralized Execution (CTDE) paradigm, enabling collaborative agent interactions to adaptively adjust teacher fusion weights based on training dynamics. Experimental results on Semantic Textual Similarity and transfer tasks demonstrate that MaCSE outperforms most existing baselines and even rivals methods using large language models for sentence representation. Our implementation is available at GitHub1.
Loading