Abstract: We address the problem of assigning each query word an appropriate weight in the retrieval function. Term weight assignment is important, which depends on the relationship among the query words and also impacts the retrieval performance directly. However, various retrieval models can be adopted by the system, which requires different approaches to set term weight and those empirical settings can not ensure to improve the retrieval quality. We propose an unified approach for different retrieval functions to set a unique weight to each individual word. We explore the popular retrieval functions and propose to regard the retrieval function as a linear classification model, which is aimed to predicate the relevance of the document. Thus the parameters in the learning model can be explained as the term weight in the retrieval model. For each query topic, we adopt the generative model and the discriminative model to estimate the term weight by taking the relevance feedback information as the training data. Our analysis gives more insight into the Rocchio's framework on relevance feedback, which can be taken as a special case in the generative model. Experimental results on the benchmark datasets show that by estimating proper weight to each query word, our approach can outperform the baseline methods of BM25 and obtain an equivalent performance with the probability language model.
0 Replies
Loading