MA4DIV: Multi-Agent Reinforcement Learning for Search Result Diversification

Yiqun Chen; Jiaxin Mao; Yi Zhang; Dehong Ma; Long Xia; Jun Fan; Daiting Shi; Zhicong Cheng; Gu Simiu; Dawei Yin

MA4DIV: Multi-Agent Reinforcement Learning for Search Result Diversification

Yiqun Chen, Jiaxin Mao, Yi Zhang, Dehong Ma, Long Xia, Jun Fan, Daiting Shi, Zhicong Cheng, Gu Simiu, Dawei Yin

Published: 29 Jan 2025, Last Modified: 29 Jan 2025WWW 2025 OralEveryoneRevisionsBibTeXCC BY 4.0

Track: Search and retrieval-augmented AI

Keywords: Learning to Rank, Search Result Diversification, Multi-Agent Cooperation, Reinforcement Learning

TL;DR: A Search Results Diversification (SRD) methods using Multi-agent Reinforcement Learning.

Abstract: Search result diversification (SRD), aimed at ensuring that selected documents in a ranking list cover a wide range of subtopics, is a significant and extensively studied problem in Web search and Information Retrieval. Existing methods primarily utilize a paradigm of "greedy selection", i.e., selecting one document with the highest diversity score at a time. These approaches tend to be inefficient and are easily trapped in a suboptimal state. In addition, some other methods optimize an approximation of the objective function, but the results still remain suboptimal. To address these challenges, we introduce \textbf{M}ulti-\textbf{A}gent reinforcement learning (MARL) for search result \textbf{DIV}ersity, which called \textbf{MA4DIV}. In this approach, each document is an agent and the search result diversification is modeled as a cooperative task among multiple agents. By modeling the SRD ranking problem as a cooperative MARL problem, this approach allows for directly optimizing the diversity metrics, such as $\alpha$-NDCG, while achieving high training efficiency. We conducted experiments on public TREC datasets and a large-scale dataset in the industrial setting. The results show that MA4DIV achieves substantial improvements in both effectiveness and efficiency than existing baselines, especially on the industrial scale dataset.

Submission Number: 462

Loading