Active Learning with Crowd Sourcing Improves Information Retrieval

Zhuotong Chen; Yifei Ma; Branislav Kveton; Anoop Deoras

Active Learning with Crowd Sourcing Improves Information Retrieval

Zhuotong Chen, Yifei Ma, Branislav Kveton, Anoop Deoras

Published: 20 Jun 2023, Last Modified: 10 Jul 2023ILHF Workshop ICML 2023EveryoneRevisions

Keywords: active learning with human feedback, information retrieval, search, recommender systems

TL;DR: We democratize active learning with human feedback for information retrieval, using tools that are publicly available and reproducible.

Abstract: In this work, we show how to collect and use human feedback to improve complex models in information retrieval systems. Human feedback often improves model performance, yet little has been shown to combine human feedback and model tuning in an end-to-end setup with public resources. To this end, we develop a system called Crowd-Coachable Retriever (CCR), where we use crowd-sourced workers and open-source software to improve information retrieval systems, by asking humans to label the best document from a short list of retrieved documents to answer a randomly chosen query at a time. We consider two unique contributions. First, our exploration space contains millions of possible documents yet we carefully select a few candidates to a given query to reduce human workload. Secondly, we use latent-variable methods to cross-validate human labels to improve their quality. We benchmark CCR on two large-scale information retrieval datasets, where we actively learn the most relevant documents using baseline models and crowd workers, without accessing the given labels from the original datasets. We show that CCR robustly improves the model performance beyond the zero-shot baselines and we discuss some key differences with active learning simulations based on holdout data.

Submission Number: 3

Loading