Abstract: While many studies have been conducted on query understanding, there is limited understanding on why users start searches and how to predict search intent. In this paper, we propose to study this important but less explored problem. Our key intuition is that searches are triggered by different pre-search contexts, but the triggering relations are often hidden. For example, a user may search "bitcoin" because of a news article or an email the user just read, but the system does not know which of the pre-search contexts (the news article or the email) is the triggering source. Following this intuition, we conduct an in-depth analysis of pre-search context on a large-scale user log, which not only verifies the hidden triggering relations in the real world but also identifies a set of important characteristics of pre-search context and their triggered queries. Since the hidden triggering relations make it challenging to directly use pre-search context for intent prediction, we develop a mixture generative model to learn without any supervision how queries are triggered by different types of pre-search context. Further, we discuss how to apply our model to improve query prediction and query auto-completion. Our experiments on a large-scale of real-world data show that our model could accurately predict user search intent with pre-search context and improve upon the state-of-the-art methods significantly.
0 Replies
Loading