Abstract: Patent prior art queries are full patent applications which are much longer than standard web search topics. Such queries are composed of hundreds of terms and do not represent a focused information need. One way to make the queries more focused is to select a group of key terms as representatives. Existing works show that such a selection to reduce patent queries is a challenging task mainly because of the presence of ambiguous terms. Given this setup, we present a query modeling approach where we utilize patent-specific characteristics to generate more precise queries. We propose to automatically disambiguate query terms by employing noun phrases that are extracted using the global analysis of the patent collection. We further introduce a method for predicting whether expansion using noun phrases would improve the retrieval effectiveness. Our experiments show that we can obtain almost 20% improvement by performing query expansion using the true importance of the noun phrase queries. Based on this observation, we introduce various features that can be used to estimate the importance of the noun phrase query. We evaluated the effectiveness of the proposed method on the patent prior art search collection CLEF-IP 2010. Our experimental results indicate that the proposed features make good predictors of the noun phrase importance, and selective application of noun phrase queries using the importance predictors outperforms existing query generation methods.
0 Replies
Loading