Tagging the Thought: Unlocking Personalization Reasoning via Reinforcement Learning

Song Jin; Juntian Zhang; Yong Liu; Xun Zhang; Yufei zhang; Fei Jiang; Guojun Yin; Wei Lin; Rui Yan

Tagging the Thought: Unlocking Personalization Reasoning via Reinforcement Learning

Song Jin, Juntian Zhang, Yong Liu, Xun Zhang, Yufei zhang, Fei Jiang, Guojun Yin, Wei Lin, Rui Yan

18 Sept 2025 (modified: 05 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Personalization, Reasoning, Reinforcement Learning

Abstract: Recent advancements have endowed Large Language Models (LLMs) with impressive general reasoning capabilities, yet they often struggle with personalization reasoning—the crucial ability to analyze user history, infer unique preferences, and generate tailored responses. To address this limitation, we introduce \textbf{TagPR}, a novel training framework that significantly enhances an LLM's intrinsic capacity for personalization reasoning through a ''tagging the thought'' approach. Our method first develops a data-driven pipeline to automatically generate and semantically label reasoning chains, creating a structured dataset that fosters interpretable reasoning. We then propose a synergistic training strategy that begins with Supervised Fine-Tuning (SFT) on this tagged data to establish foundational reasoning patterns, followed by a multi-stage reinforcement learning (RL) process. This RL phase is guided by a unique composite reward signal, which integrates tag-based constraints and a novel Personalization Reward Model with User Embeddings (PRMU) to achieve fine-grained alignment with user-specific logic. Extensive experiments on the public LaMP benchmark and a self-constructed dataset demonstrate that our approach achieves state-of-the-art results, delivering an average improvement of 32.65\% over the base model across all tasks. Our work validates that structured, interpretable reasoning is a highly effective pathway to unlocking genuine personalization capabilities in LLMs.

Supplementary Material: pdf

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 11054

Loading