Keywords: model editing, reversal curse, language model, relational knowledge, knowledge editing
TL;DR: Language models can learn to encode relational knowledge in a bilinear relational structure, a mechanism that mitigates the reversal curse and enable cosistent model editing.
Abstract: The reversal curse—a language model's inability to infer an unseen fact "B is A" from a learned fact "A is B"—is widely considered a fundamental limitation. We show that this is not an inherent failure but an artifact of how models encode knowledge. Our results demonstrate that training from scratch on synthetic relational knowledge graphs leads to the emergence of a bilinear relational structure within the models' hidden representations. This structure alleviates the reversal curse and facilitates inference of unseen reverse facts. Crucially, this bilinear geometry is foundational for consistent model editing: updates to a single fact propagate correctly to its reverse and logically dependent relations. In contrast, models lacking this representation suffer from the reversal curse and fail to generalize model edits, leading to logical inconsistencies. Our results establish that training on a relational knowledge dataset induces the emergence of bilinear internal representations, which in turn support language models in behaving in a logically consistent manner after editing. This suggests that the efficacy of language model editing depends not only on the choice of algorithm but on the underlying representational geometry of the knowledge itself.
Primary Area: interpretability and explainable AI
Submission Number: 21044
Loading