Mitigating Reward Hacking in LLM-based Recommendation: A Preference Optimization Approach

Heyu Chen; Junkang Wu; Guoqing Hu; Kexin Huang; Xiang Wang; Jiancan Wu

Mitigating Reward Hacking in LLM-based Recommendation: A Preference Optimization Approach

Heyu Chen, Junkang Wu, Guoqing Hu, Kexin Huang, Xiang Wang, Jiancan Wu

03 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Sequential Recommendation, Preference Optimization, Reward Hacking

TL;DR: We propose SIRIUS, a pseudo-negative sampling method that mitigates reward hacking and improves LLM-based recommendation.

Abstract: Post-training adaptation has become the central paradigm for leveraging large language models (LLMs) in recommendation. While recent preference optimization methods, such as Direct Preference Optimization (DPO), enhance pairwise preference discrimination, they remain vulnerable to \emph{reward hacking}: models exploit imperfections in reward signals, leading to inflated training metrics without genuine recommendation gains. We provide a theoretical analysis of this phenomenon from a gradient perspective and formalize the concept of the \emph{$\varepsilon$-insensitive region}, where pairwise updates exert negligible influence on the relative ordering between positives and unsampled negatives. We further show under the Bradley–Terry model that such regions can occupy a substantial portion of the preference distribution, inevitably causing misaligned ranking. To address this issue, we propose \textbf{Si}mulated Preference Optimization for \textbf{R}eward-hacking m\textbf{i}tigation using Pseudo-negatives (\textbf{\our{}}). Our framework introduces pseudo-negative samples to enrich contrastive signals and reduce the prevalence of $\varepsilon$-insensitive regions. Extensive experiments on three public benchmarks—LastFM, Goodreads, and Steam—demonstrate that \our{} consistently improves ranking quality and effectively mitigates reward hacking, providing both theoretical and practical insights for advancing LLM-based recommendation. Our code is available at \url{https://anonymous.4open.science/r/C557-id}

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 1328

Loading