Interactive Information Retrieval with Bandit Feedback

Huazheng Wang, Yiling Jia, Hongning Wang

2021 (modified: 24 Feb 2022)SIGIR 2021Readers: Everyone

Abstract: Information retrieval (IR) in nature is a process of sequential decision making. The system repeatedly interacts with the users to refine its understanding of the users' information needs, improve its estimation of result relevance, and thus increase the utility of its returned results (e.g., the result rankings). Distinct from traditional IR solutions that rigidly execute an offline trained policy, interactive information retrieval emphasizes online policy learning. This, however, is fundamentally difficult for at least three reasons. First, the system only collects user feedback on the presented results, aka, the bandit feedback. Second, users' feedback is known to be noisy and biased. Third, as a result, the system always faces the conflicting goals of improving its policy by presenting currently underestimated results to users versus satisfying the users by ranking the currently estimated best results on top. In this tutorial, we will first motivate the need for online policy learning in interactive IR, by highlighting its importance in several real-world IR problems where online sequential decision making is necessary, such as web search and recommendations. We will carefully address the new challenges that arose in such a solution paradigm, including sample complexity, costly and even outdated feedback, and ethical considerations in online learning (such as fairness and privacy) in interactive IR. We will prepare the technical discussions by first introducing several classical interactive learning strategies from machine learning literature, and then fully dive into the recent research developments for addressing the aforementioned fundamental challenges in interactive IR. Note that the tutorial on "Interactive Information Retrieval: Models, Algorithms, and Evaluation" will provide a broad overview on the general conceptual framework and formal models in interactive IR, while this tutorial covers the online policy learning solutions for interactive IR with bandit feedback.

0 Replies