Bayesian Iterative Prediction and Lexical-based Interpretation for Disturbed Chinese Sentence Pair Matching
Abstract: In an era dominated by web-based intelligent customer services, the applications of Sentence Pair Matching are profoundly broad. Web agents, for example, automatically respond to customer queries by finding similar past questions, significantly reducing customer service expenses. While current large language models (LLMs) offer powerful text generation capabilities, they often struggle with opacity, potential text toxicity, and difficulty managing domain-specific and confidential business inquiries. Consequently, the widespread adoption of web-based intelligent customer services in real-world business still greatly relies on query-based interactions. In this paper, we introduce a series of model-agnostic techniques aimed at enhancing both the accuracy and interpretability of Chinese pairwise sentence-matching models. Our contributions include (1) An Edit-distance-weighted fine-tuning method, (2) A Bayesian Iterative Prediction algorithm, (3) A Lexical-based Dual Ranking Interpreter, and (4) A Bi-criteria Denoising strategy. Experimental results on the Large-scale Chinese Question Matching Corpus (LCQMC) with a disturbed test demonstrate that our fine-tuning and prediction methods can steadily improve matching accuracy, building on the current state-of-the-art models. Besides, our interpreter with denoising strategy markedly enhances token-level interpretation in rationality and loyalty. In both matching accuracy and interpretation, our approaches outperform classic methods and even LLMs.
Loading