MIRA: Leveraging Multi-Intention Co-click Information in Web-scale Document Retrieval using Deep Neural Networks
Abstract: We study the problem of deep recall model in industrial web search, which is, given a user query, retrieve hundreds of most relevant documents from billions of candidates. The common framework is to encoding queries and documents separately into distributed representations and match them in latent semantic space. However, all the exiting deep encoding models only leverage the information of the document itself, which is often not sufficient in practice when matching with query terms, especially for the hard tail queries. In this work we aim to leverage the additional information for documents from their co-click neighbours to help document retrieval. The challenges include how to effectively extract information and eliminate noise when involving co-click information while meet the demands of industrial scalability for real time online serving.To handle the noise in co-click relations, we firstly propose a web-scale Multi-Intention Co-click document Graph(MICG) which builds the co-click connections between documents on click intention level but not on document level. Then we present an encoding framework MIRA based on Bert and graph attention networks which leverages a two-factor attention mechanism to aggregate neighbours. To meet the online latency requirements, we only involve neighbour information in document side which can save the time-consuming query neighbor search in real time serving. We conduct extensive offline experiments on two public datasets and one private web-scale dataset from major commercial search engines(Bing1 and Sougou2) demonstrating the effectiveness and scalability of the proposed method compared with several baselines. And a further case study reveals that co-click relations mainly help improve web search quality from two aspects: key concept enhancing and query term complementary.
Loading