Abstract: We propose to mine intention phrases from large numbers of queries, for enabling rich query interpretation that identifies both query intentions and associated intention types. We formalize the notion of intention phrase as a sequence of keywords and an intention type, propose its three criteria (relevance, completeness, and clarity), and identify two key challenges. To handle the criterion modeling challenge, we design a jointly training framework with a sequence labeling model and a clarity classification model. To untangle the data sparsity challenge, we are the first to leverage definitions to learn the embeddings of words, discover a new data source (dictionary), and develop the definition-augmented encoder to generate good semantic representations for words and sentences. Our experiments over three large corpora (hotel, tourism, and product domains) verify the advantage of our model over baselines.
Loading