Abstract: Large language models (LLMs) are prone to hallucinate unintended text due to false or outdated knowledge. Since retraining LLMs is resource intensive, there has been a growing interest in model editing. Despite the emergence of benchmarks and approaches, these unidirectional editing and evaluation have failed to explore the reversal curse. In this paper, we study bidirectional language model editing, aiming to provide a rigorous evaluation to assess if edited LLMs can recall the editing knowledge bidirectionally. A metric of reversibility is introduced and a benchmark dubbed as Bidirectional Assessment for Knowledge Editing (BAKE) is constructed to evaluate if post-edited models can recall the editing knowledge in the reverse direction of editing. Experimental results show that while most editing methods are able to accurately recall editing facts along the modification direction, they exhibit substantial systematic deficiencies when evaluating in the reverse direction. Our findings also reveal that the in-context learning (ICL) can mitigate the reversal curse to a certain extent.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Jian_Kang1
Submission Number: 6967
Loading