Unlearning Misalignment for Personalized LLM Adaptation via Instance-Response-Dependent Discrepancies

TMLR Paper4819 Authors

10 May 2025 (modified: 04 Jun 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: While Large Language Models (LLMs) have revolutionized chatbot interactions, they often fall short in aligning responses with the nuanced preferences of individual users—a challenge rooted in the inherently subjective and proprietary nature of user preferences. Consequently, prompt-based learning, though effective in enhancing factual accuracy due to its emphasis on universal correctness, remains insufficient for achieving accurate personalised response alignment. Because user preferences vary widely across individuals and contexts, aligning responses requires a more personalized and context-aware approach. To address this limitation, we propose Consistent Marginalization (CM)—a novel framework that aims to unlearn misalignment by constructing a personalised memory bank of instance-response-dependent discrepancies, built from a small set of user preference samples. This personalised memory bank equips LLMs with the ability to understand, recall, and adapt to individual preferences, enabling more consistent and personalized responses. Evaluated across a diverse range of domain-specific datasets and model architectures, CM yields notable improvements in response alignment and robustness. We believe Consistent Marginalization represents a valuable step toward enabling LLMs to become genuinely personable and adaptive conversational agents by understanding user preferences and generating responses that are better aligned with individual user expectations.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: First time submitting to TMLR
Assigned Action Editor: ~Han_Zhao1
Submission Number: 4819
Loading