CoNRec: Context-Discerning Negative Recommendation with LLMs

ICLR 2026 Conference Submission736 Authors

02 Sept 2025 (modified: 23 Dec 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Recommendation System, Large Language Models
TL;DR: Given the challenge of understanding users’ dislikes and flaws in existing methods, we propose the first LLM framework for negative feedback modeling with semantic IDs and it achieves SOTA on Taobao’s dataset.
Abstract: Understanding what users like is relatively straightforward; understanding what users dislike, however, remains a challenging and underexplored problem. Research into users' negative preferences has gained increasing importance in modern recommendation systems. Numerous platforms have introduced explicit negative feedback mechanisms and leverage such signals to refine their recommendation models. Beyond traditional business metrics, user experience-driven metrics, such as negative feedback rates, have become critical indicators for evaluating system performance. However, most existing approaches primarily use negative feedback as an auxiliary signal to enhance positive recommendations, paying little attention to directly modeling negative interests, which can be highly valuable in offline applications. Moreover, due to the inherent sparsity of negative feedback data, models often suffer from context understanding biases induced by positive feedback dominance. To address these challenges, we propose the first large language model (LLM) framework for negative feedback modeling with special designed context-discerning modules. We use hierarchical semantic ID Representation to replaces text-based item descriptions and introduce an item-level alignment task that enhances the LLM’s understanding of the semantic context behind negative feedback. Furthermore, we design a Progressive Group Relative Policy Optimization (GRPO) training paradigm that enables the model to dynamically balance the positive and negative behavioral context utilization. Besides, our investigation further reveals a fundamental misalignment between the conventional next-negative-item prediction objective and users’ true negative preferences, which is heavily influenced by the system’s recommendation order. To mitigate this, we propose a novel reward function and evaluation metric grounded in multi-day future negative feedback and their collaborative signals. Extensive experiments on a real-world industry-scale dataset from Taobao demonstrate that our method achieves state-of-the-art performance. Our work offers meaningful insights not only for the emerging field of negative feedback modeling but also for the broader recommendation community.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 736
Loading