The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A”

Published: 16 Jan 2024, Last Modified: 04 Apr 2024ICLR 2024 posterEveryoneRevisionsBibTeX
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: LLMs, Large Language Models, Question Answering, Generalization, Knowledge Representation, Logical Inference, Relations
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We demonstrate experimentally that LLMs trained on facts in one direction ("A is B") do not generalize to the reverse direction ("B is A").
Abstract: We expose a surprising failure of generalization in auto-regressive large language models (LLMs). If a model is trained on a sentence of the form ''_A_ is _B_'', it will not automatically generalize to the reverse direction ''_B_ is _A_''. This is the **Reversal Curse**. For instance, if a model is trained on ''Valentina Tereshkova was the first woman to travel to space'', it will not automatically be able to answer the question, ''Who was the first woman to travel to space?''. Moreover, the likelihood of the correct answer (''Valentina Tershkova'') will not be higher than for a random name. Thus, models do not generalize a prevalent pattern in their training set: if ''_A_ is _B_'' occurs, ''_B_ is _A_'' is more likely to occur. It is worth noting, however, that if ''_A_ is _B_'' appears _in-context_, models can deduce the reverse relationship. We provide evidence for the Reversal Curse by finetuning GPT-3 and Llama-1 on fictitious statements such as ''Uriah Hawthorne is the composer of _Abyssal Melodies_'' and showing that they fail to correctly answer ''Who composed _Abyssal Melodies?_''. The Reversal Curse is robust across model sizes and model families and is not alleviated by data augmentation. We also evaluate ChatGPT (GPT-3.5 and GPT-4) on questions about real-world celebrities, such as ''Who is Tom Cruise's mother? [A: Mary Lee Pfeiffer]'' and the reverse ''Who is Mary Lee Pfeiffer's son?''. GPT-4 correctly answers questions like the former 79\% of the time, compared to 33\% for the latter. Code available at: https://github.com/lukasberglund/reversal_curse.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: pdf
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Primary Area: generative models
Submission Number: 7755
Loading