Abstract: This paper addresses the challenge of the semantic gap between user queries and web content, commonly referred to as asymmetric text matching, within the domain of web search. By leveraging BERT for reading comprehension, current algorithms enable significant advancements in query understanding, but still encounter limitations in effectively resolving the asymmetrical ranking problem due to model comprehension and summarization constraints.To tackle this issue, we propose the QAGR (Question-Answer Generation and Ranking) method, comprising an offline module called QAGeneration and an online module called QARanking. The QAGeneration module utilizes large language models (LLMs) to generate high-quality question-answering pairs for each web page. This process involves two steps: generating question-answer pairs and performing verification to eliminate irrelevant questions, resulting in high-quality questions associated with their respective documents. The QARanking module combines and ranks the generated questions and web page content. To ensure efficient online inference, we design the QARanking model as a homogeneous dual-tower model, incorporating query intent to drive score fusion while balancing keyword matching and asymmetric matching. Additionally, we conduct a preliminary screening of questions for each document, selecting only the top-N relevant questions for further relevance calculation.Empirical results demonstrate the substantial performance improvement of our proposed method in web search. We achieve over 8.7% relative offline relevance improvement and over 8.5% online engagement gain compared to the state-of-the-art web search system. Furthermore, we deploy QAGR to online web search engines and share our deployment experience, including production considerations and ablation experiments. This research contributes to advancing the field of asymmetric web search and provides valuable insights for enhancing search engine performance.
Loading