Enhancing Transformer-based Semantic Matching for Few-shot Learning through Weakly Contrastive Pre-training
Abstract: The task of semantic text matching focuses on measuring the semantic similarity between two distinct texts and is widely applied in search and ranking scenarios. In recent years, pre-trained models based on the Transformer architecture have demonstrated powerful semantic representation capabilities and have become the mainstream method for text representation. The pipeline of fine-tuning pre-trained language models on downstream semantic matching tasks has achieved promising results and widespread adoption. However, practical downstream scenarios often face severe challenges in terms of data quality and quantity. Ensuring high-quality and large quantities of samples is often difficult. Current research on enhancing pre-trained models for few-shot semantic text matching tasks is still not advanced enough. Therefore, this paper focuses on providing a general enhancement scheme for few-shot semantic text matching tasks. Specifically, we propose an Enhanced Transformer-based Semantic Matching method for few-shot learning through weakly contrastive pre-training, which is named as EBSIM. Firstly, considering the characteristics of semantic text matching tasks, we design a simple and cost-effective data augmentation method for constructing weakly supervised samples. Then, we design a contrastive learning objective based on alignment-aspect to achieve effective semantic matching by optimizing the bidirectional semantic perception between constructed texts. We conduct comprehensive experiments on five Chinese and English semantic text matching datasets using various Transformer-based pre-trained models. The experimental results confirm that our proposed method significantly improves the model's performance on semantic text matching tasks. Further ablation experiments and case studies validate the effectiveness of our approach. Our code and data will be made publicly available at a later stage.
Primary Subject Area: [Generation] Multimedia Foundation Models
Secondary Subject Area: [Experience] Multimedia Applications
Relevance To Conference: This research primarily contributes to enhancing text-based semantic matching in the multimodal domain. Specifically, the focus of this study is to leverage textual modality information to improve the effectiveness of text semantic matching, which can be widely applied in scenarios such as information retrieval and intelligent question answering. The proposed method belongs to a low-cost two-stage pre-training approach, which can be regarded as a fine-tuning scheme. The current work mainly utilizes text information from readily available raw data for validation purposes. This approach can be easily extended to visual modality scenarios and multimodal scenarios involving visual, and speech modalities.
Submission Number: 3339
Loading