Personal Tokens Matter: Towards Token-Aware Training for Personalized LLMs

15 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Language Models, Personalization, Personal Tokens
Abstract: With large language models (LLMs) now performing strongly across diverse tasks, there is growing demand for them to personalize outputs for individual users. Personalization is typically framed as an additional layer on top of a base NLP task, requiring models to meet user-specific needs while still completing the underlying task. This overlay nature means that many tokens in a response serve the base task, while only a subset encode user-specific information. We term these personal tokens, as they are essential for rendering responses personalized. However, their varying positions and contents across tasks make them difficult to detect directly. To address this challenge, we propose PerContrast, a causal intervention–based method that identifies personal tokens by measuring each output token’s dependence on user-specific information, achieving up to 87.8\% F1 score on our benchmark. Building on this insight, we develop the PerCE loss, which adaptively upweights personal tokens during training via an expectation–maximization procedure, enabling the model to alternately identify and optimize these tokens. Experiments on multiple LLMs show that PerCE substantially improves personalization performance with minimal additional cost, yielding average gains over 10\% and up to 40.57\% on the LongLaMP dataset, along with strong cross-task and cross-scenario transferability. These results highlight the central role of personal tokens and establish token-aware training as a simple yet effective paradigm for advancing personalized LLMs.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 5585
Loading