Hierarchical Reinforcement Learning for Long-term Fairness in Interactive Recommendation

Chongjun Xia; Xiaoyu Shi; Hong Xie; Quanliang Liu; Mingsheng Shang

Hierarchical Reinforcement Learning for Long-term Fairness in Interactive Recommendation

Chongjun Xia, Xiaoyu Shi, Hong Xie, Quanliang Liu, Mingsheng Shang

Published: 01 Jan 2024, Last Modified: 24 Jul 2025ICWS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: With the growing influence of Recommender System (RS) in people’s daily lives, the issue of fairness in recommendations has become increasingly crucial. Previous fairness-aware methods focus on static or one-shot recommendation settings, where the recommendation model provides static fairness solutions via solving a fixed fairness-constrained optimization. However, these approaches face challenges in maintaining the delicate balance between recommendation accuracy and fairness in dynamic environments, as they fail to account for the evolving nature of RS, including changes in user preferences and item popularity over time.In this paper, we explore the problem of long-term fairness in interactive recommendation and accomplish the problem through dynamic fairness-constrained decisions. We focus on maintaining the exposure fairness of items across different groups. We argue that fairness does not always in conflict with recommendation performance, especially when considering the spatiotemporal heterogeneity of user preference on item popularity. To achieve this, we propose HER4IF, a dynamic fairness-aware interactive recommendation method based on a hierarchical reinforcement learning framework. Its main idea is first to aggregate the interacted item popularity with the time-forgetting model in state representation to capture user popularity preference. It then introduces a high-level agent to generate a dynamic fairness constraint based on the user’s current state, while a low-level agent generates recommendations under this constraint. Experiments on two datasets and an authentic Reinforcement Learning environment (KuaiSim) show the effectiveness and superiority of the proposed framework in terms of recommendation accuracy and fairness. It demonstrates a win-win for fairness and accuracy in a dynamic recommendation setting, when considering the dynamic nature of RS and incorporating spatiotemporal heterogeneity in user preference on item popularity.

Loading