Counterfactual Multi-player Bandits for Explainable Recommendation Diversification

Yansen Zhang, Bowei He, Xiaokun Zhang, Haolun Wu, Zexu Sun, Chen Ma

Published: 03 Oct 2025, Last Modified: 21 Jan 2026Machine Learning and Knowledge Discovery in Databases. Research TrackEveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Existing recommender systems tend to prioritize items closely aligned with users’ historical interactions, inevitably trapping users in the dilemma of “filter bubble”. Recent efforts are dedicated to improving the diversity of recommendations. However, they mainly suffer from two major issues: 1) a lack of explainability, making it difficult for the system designers to understand how diverse recommendations are generated, and 2) limitations to specific metrics, with difficulty in enhancing non-differentiable diversity metrics. To this end, we propose a Counterfactual Multi-player Bandits (CMB) method to deliver explainable recommendation diversification across a wide range of diversity metrics. Leveraging a counterfactual framework, our method identifies the factors influencing diversity outcomes. Meanwhile, we adopt the multi-player bandits to optimize the counterfactual optimization objective, making it adaptable to both differentiable and non-differentiable diversity metrics. Extensive experiments conducted on three real-world datasets demonstrate the applicability, effectiveness, and explainability of the proposed CMB. © 2026 The Author(s), under exclusive license to Springer Nature Switzerland AG.
Loading