Evaluating the Reversal Curse in Model Editing

Hao-Xiang Xu; Jun-Yu Ma; Zhen-Hua Ling; Quan Liu; Cong Liu; Jia-Chen Gu

Evaluating the Reversal Curse in Model Editing

Hao-Xiang Xu, Jun-Yu Ma, Zhen-Hua Ling, Quan Liu, Cong Liu, Jia-Chen Gu

Published: 02 Jun 2026, Last Modified: 02 Jun 2026Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large language models (LLMs) are prone to hallucinate unintended text due to false or outdated knowledge. Since retraining LLMs is resource intensive, there has been a growing interest in model editing. Despite the emergence of benchmarks and approaches, these unidirectional editing and evaluation have failed to explore the reversal curse. In this paper, we study bidirectional language model editing, aiming to provide a rigorous evaluation to assess if edited LLMs can recall the editing knowledge bidirectionally. A metric of reversibility is introduced and a benchmark dubbed as Bidirectional Assessment for Knowledge Editing (BAKE) is constructed to evaluate if post-edited models can recall the editing knowledge in the reverse direction of editing. Experimental results show that while most editing methods are able to accurately recall editing facts along the modification direction, they exhibit substantial systematic deficiencies when evaluating in the reverse direction. Our findings also reveal that the in-context learning (ICL) can mitigate the reversal curse to a certain extent.

Submission Type: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Jian_Kang1

Submission Number: 6967

Loading