Abstract: Simulating user sessions in a way that comes closer to the original user interactions is key to generating user data at any desired volume and variety such that A/B-testing in domain-specific search engines becomes scalable. In recent years, research on evaluating Information Retrieval (IR) systems has mainly focused on simulation as means to improve users models and evaluation metrics about the performance of search engines using test collections and user studies. However, test collections contain no user interaction data and user studies are expensive to conduct. Thus there is a need in developing a methodology for evaluating simulated user sessions. In this paper, we propose evaluation metrics to assess the realism of simulated sessions and describe a pilot study to assess the capability of generating simulated search sequences representing an approximation of real behaviour. Our findings highlight the importance of investigating and utilising classification-based metrics besides the distribution-based ones in the evaluation process.
0 Replies
Loading