Abstract: Validation of information retrieval(IR) systems represents an inherently difficult task. We present a study that uses the Kappa measure for inter-judge agreement for establishing a reference quality benchmark for responses provided by a custom developed IR system in a comparative analysis with already existing search mechanism. Experiments show that it is difficult to assess the relevance of responses as human judges do not always easily agree on what is relevant and what is not. The results prove that when judges agree the responses from our system are mostly better than those returned by existing mechanism. This bench-marking mechanism opens the way for further detailed investigation of responses that were not relevant and possible improvement of the IR system design.
0 Replies
Loading