Causal Strengths and Leaky Beliefs: Interpreting LLM Reasoning via Noisy-OR Causal Bayes Nets

Published: 22 Sept 2025, Last Modified: 22 Sept 2025WiML @ NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Human-LLM alignment, causal reasoning
Abstract: The nature of intelligence in both humans and machines is a long-standing question. % in cognitive science. While there is no universally accepted definition, the ability to reason causally is often regarded as a pivotal aspect of intelligence \citep{lake2017building}. This is a key question, which provides insights into whether LLMs reproduce human biases and whether their performance lags or even exceeds human capabilities to reason causally. Evaluating causal reasoning in LLMs and humans on the same tasks provides hence a more comprehensive understanding of their respective strengths and weaknesses. \textbf{Goals, Contributions \& Methods.} Our study asks: (Q1) Are LLMs aligned with humans given the same reasoning tasks (RW17 collider tasks)? (see \citet{dettki2025large,rehder2017failures}) (Q2) Do LLMs and humans reason consistently at the task level? (Q3) Do they have distinct reasoning signatures? We answer these by evaluating 20$+$ LLMs on eleven $C_1\!\to\!E\!\leftarrow\!C_2$ queries on semantically meaningful tasks (RW17) %, Abstract, and Overloaded regimes, under \emph{Numeric} (one-shot number as response = likelihood judgment of query node being one (\Cref{fig:main_comparison_agg})) and \emph{Chain of Thought} (CoT; think first, then provide answer) prompting at $T{=}0$. Judgments are modeled with a leaky noisy-OR causal Bayes net (CBN) whose parameters $\theta=(b,m_1,m_2,p(C)) \in [0,1]$ include a shared prior $p(C)$; we select the winning model via AIC between a 3-parameter symmetric causal strength ($m_1{=}m_2$) and 4-parameter asymmetric ($m_1{\neq}m_2$) variant. The 3 research questions $Q_i$ map to: human-LLM Spearman correlation $\rho$ (Q1), task-level LOOCV-$R^2$ from CBN fits (Q2), parameter-signature profiling $(b,m_1,m_2,p(C))$ (Q3). This separates our work from \Citet{dettki2025large} by replacing the logistic link with leaky noisy-OR, expanding the number of evaluated LLMs ($\sim 5\times$), and enabling clearer evaluations of explaining away (EA) and Markov-violation (MV) diagnostics. % (see \Cref{fig:main_comparison_agg}). EA emerges in collider graphs when evidence for one cause reduces the belief in the other cause, visually represented as a positive slope in \Cref{fig:comparison_agg_3}. MV occurs when the presence of one cause affects the belief in another cause, violating the independence assumption in a collider structure \Cref{fig:comparison_agg_3}, visually represented by a slope e.g., humans, while \texttt{o3} shows no MV.
Submission Number: 282
Loading