Unlearning Misalignment for Personalized LLM Adaptation via Instance-Response-Dependent Discrepancies

TMLR Paper4819 Authors

10 May 2025 (modified: 30 Aug 2025)Decision pending for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: While Large Language Models (LLMs) have revolutionized chatbot interactions, they often fall short in aligning responses with the nuanced preferences of individual users—a challenge rooted in the inherently subjective and proprietary nature of user preferences. Consequently, prompt-based learning, though effective in enhancing factual accuracy due to its emphasis on universal correctness, remains insufficient for achieving accurate personalised response alignment. Because user preferences vary widely across individuals and contexts, aligning responses requires a more personalized and context-aware approach. To address this limitation, we propose Consistent Marginalization (CM)—a novel framework that aims to unlearn misalignment by constructing a personalised memory bank of instance-response-dependent discrepancies, built from a small set of user preference samples. This personalised memory bank equips LLMs with the ability to understand, recall, and adapt to individual preferences, enabling more consistent and personalized responses. Evaluated across a diverse range of domain-specific datasets and model architectures, CM yields notable improvements in response alignment and robustness. We believe Consistent Marginalization represents a valuable step toward enabling LLMs to become genuinely personable and adaptive conversational agents by understanding user preferences and generating responses that are better aligned with individual user expectations.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Dear Action editor We thank the reviewers for their constructive feedback. In response, we have made the following revisions to improve clarity, structure, and experimental grounding in the updated manuscript: * **Figure 4 (Page 7)**: Added a new figure illustrating the experimental setup, including sample inputs, memory-banked responses, and model outputs, to clarify the workflow. * **Motivating Example (Page 2)**: Updated the introduction example to better align with our discrete formulation, using a regional slang case (e.g., “Kopi O”). * **Introduction Refinement**: Revised the introduction and removed the reference to “hallucination” to maintain focus on the core motivation. * **Problem Setup and Related Work Reorganization**: Moved the problem setup earlier in the manuscript for improved flow, and relocated the related work section to the end. * **Clarified Output Definition**: Explicitly defined the output space $Y$ at the **response level**. * **Notation Revisions (Page 8)**: Corrected notation in the second line of the formulation, clarifying the role of $G$ as the LLM selecting from a predefined candidate set. * **Selection Criterion (Section 4, Page 8)**: Revised the user-preference sample selection strategy for stronger justification and alignment with the updated problem formulation. * **Significance Highlighting in Experiments**: Bolded only statistically significant improvements in the results section. For example, Table 3 now explicitly marks significant improvements across models and datasets. **Additional updates**: * **Eq. 4 → Eq. 2 (Page 5)**: Updated the formula to reflect changes to the *Instance–Response Dependent Discrepancies* term. * **Eq. 5 → Eq. 3 (Page 5)**: Revised the equation for consistency with the new formulation. * **Notation change (Page 5)**: Replaced all occurrences of $\sum_{Y′} p(Y′ \mid Y, X)$ with $p(Y′ \mid Y, X)$, highlighted in **blue** for reference. * **Section 3.1**: Revised the paragraph immediately preceding the section. * **Page 4**: Added a footnote clarifying notation. * **Page 9**: Modified a sentence for notation consistency. The modified equations are intended to better reflect the motivation for our method; they do not alter the underlying methodology, experimental setup, or reported results. All experiments, datasets, and evaluation protocols remain exactly as in the previous version. The revisions are purely notational and clarificatory in nature, aiming to improve readability and ensure consistency throughout the manuscript. We believe these changes improve the clarity, reproducibility, and alignment of our method with the evaluation protocols. Thanks Authors
Assigned Action Editor: ~Han_Zhao1
Submission Number: 4819
Loading