Abstract: Query correction is a task that automatically detects and corrects errors in what users type into a search engine. Misspelled queries can lead to user dissatisfaction and churn. However, correcting a user query accurately is not an easy task. One major challenge is that a correction model must be capable of high-level language comprehension. Recently, pre-trained language models (PLMs) have been successfully applied to text correction tasks, but few works have been done on query correction. However, it is nontrivial to directly apply these PLMs to query correction in large-scale search systems due to the following challenging issues: 1) Expensive deployment. Deploying such a model requires expensive computations. 2) Lacking domain knowledge. A neural correction model needs massive training data to activate its power. To this end, we introduce KSTEM, a Knowledge-based Sequence To Edit Model for Chinese query correction. KSTEM transforms the sequence generation task into sequence tagging by mapping errors into five categories: KEEP, REPLACE, SWAP, DELETE, and INSERT, reducing computational complexity. Additionally, KSTEM adopts 2D position encoding, which is composed of the internal and external order of the words. Meanwhile, to compensate for the lack of domain knowledge, we propose a task-specific training paradigm for query correction, including edit strategy-based pre-training, user click-based post pre-train, and human label-based fine-tuning. Finally, we apply KSTEM to the industrial search system. Extensive offline and online experiments show that KSTEM significantly improves query correction performance. We hope that our experience will benefit frontier researchers.
0 Replies
Loading