CRED: Contrastive Residual Embedding Decoding for Adaptive Concept Unlearning

Kai-Lin Yang; Kumail Alhamoud; Angela Li; Yung-Hui Li; Wen-Huang Cheng; Hong-Han Shuai

CRED: Contrastive Residual Embedding Decoding for Adaptive Concept Unlearning

Kai-Lin Yang, Kumail Alhamoud, Angela Li, Yung-Hui Li, Wen-Huang Cheng, Hong-Han Shuai

04 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Machine Learning, Contrastive Decoding, Machine Unlearning, Retrieval-Augmented Generation, Knowledge Graph

Abstract: Large language models (LLMs) trained on web-scale data inevitably encode outdated, private, or undesired knowledge, posing challenges for privacy, safety, and factual reliability. While existing machine unlearning methods typically rely on retraining or fine-tuning, these approaches are costly and risk catastrophic forgetting. In this work, we propose CRED, an in-context unlearning method that enables LLMs to forget specific concepts at inference time without any parameter updates. CRED formulates unlearning as a decoding-time intervention: given a query, it constructs retrieval-augmented prompts from both a retain set and a forget set, then computes a contrastive residual vector from their decoder embeddings. This residual is injected into the decoder of the original prompt, guiding generation away from forget-set content while preserving relevant knowledge. Experiments on the TOFU and MUSE benchmarks demonstrate that \modelnameshort achieves effective concept erasure with minimal quality degradation. Additional analyses confirm its stability under 8-bit and 4-bit quantization, highlighting its robustness and practicality for real-world deployment.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 1926

Loading