Investigating causal understanding in LLMsDownload PDF

Published: 05 Dec 2022, Last Modified: 05 May 2023MLSW2022Readers: Everyone
Abstract: We investigate the quality of causal world models of LLMs in very simple settings. We test whether LLMs can identify cause and effect in natural language settings (taken from BigBench) such as “My car got dirty. I washed the car. Question: Which sentence is the cause of the other?” and in multiple other toy settings. We probe the LLM's world model by changing the presentation of the prompt while keeping the meaning constant, e.g. by changing the order of the sentences or asking the opposite question. Additionally, we test if the model can be “tricked” into giving wrong answers when we present the shot in a different pattern than the prompt. We have three findings. Firstly, larger models yield better results. Secondly, k-shot outperforms one-shot and one-shot outperforms zero-shot in standard conditions. Thirdly, LLMs perform worse in conditions where form and content differ. We conclude that the form of the presentation matters for LLM predictions or, in other words, that LLMs don't solely base their predictions on content. Finally, we detail some of the implications this research has on AI safety.
0 Replies