Adapting Retrieval Models to Task-Specific Goals using Reinforcement Learning

Amit Sharma; Hua Li; Xue Li; Jian Jiao

Adapting Retrieval Models to Task-Specific Goals using Reinforcement Learning

Amit Sharma, Hua Li, Xue Li, Jian Jiao

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: information retrieval, policy gradient, large action space

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Abstract: Given an input query, retrieval models are trained using user feedback data (e.g., click data) to output a ranked list of items. However, it is difficult to optimize task-specific goals using supervised learning because the goals often correspond to non-differentiable losses. For example, we may want to optimize recall or novelty of the top-k items for a recommendation task or optimize accuracy of a blackbox large language model (LLM) for the retrieval-augmented generation task. To optimize arbitrary task-specific losses, we propose a reinforcement learning-based framework that applies to any pretrained retrieval model. Specifically, our solution uses policy gradient and addresses the key challenge of large action spaces by reduction to a binary action space, given both the query and the retrieved item. Our formulation also allows for exploration based on auxiliary retrieval models. We empirically evaluate the proposed algorithm on improving recall for a query-ad retrieval task on two datasets with 4K and 1.9M actions respectively. We also show the benefit of the proposed algorithm on improving a custom metric---novelty of the retrieved items w.r.t. existing algorithms---for a commercial search engine.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 4457

Loading