Abstract: Semantic matching is a fundamental task in Natural Language Processing (NLP), which is widely used in information retrieval, recommendation, and other applications. Transformer-based pre-trained language models have achieved remarkable improvements in semantic matching. However, the transformer uses only one attention mechanism, which might not be optimal for semantic matching that relies on the modeling of complex relationships. In this paper, we propose the Commix Dimensional Attention(CDA) framework to enhance the ability of language models to capture the relationships between sentence pairs from diverse aspects by exploiting and commixing four complementary attention mechanisms. Building based upon the transformer architecture, the method adopts diverse types of attention functions to capture manifold types of interactive information and effectively fuses them with a well-designed self-interactive augmentation layer and a normalized aggregation layer. Specifically, the CDA language model includes three key modules, 1) a commix dimensional attention module, 2) a self-interactive augmentation module, and 3) a normalized aggregation module. We apply the proposed CDA language model to conduct extensive experiments. Results show that the proposed model achieves consistent improvement on 10 well-studied semantic matching datasets.
Paper Type: long
Research Area: Semantics: Sentence-level Semantics, Textual Inference and Other areas
Contribution Types: NLP engineering experiment
Languages Studied: semantic matching, transformer, commix dimensional attention
0 Replies
Loading