CausalLink: An Interactive Evaluation Framework for Causal Reasoning

CausalLink: An Interactive Evaluation Framework for Causal Reasoning

ACL ARR 2025 February Submission1580 Authors

14 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: We present CausalLink, an innovative evaluation framework that interactively assesses causal reasoning skills in conversational language models. Each CausalLink test case creates a hypothetical environment in which the language models are instructed to apply interventions to entities whose interactions follow predefined causal relations generated from controllable causal graphs. Our evaluation framework isolates causal capabilities from the confounding effects of world knowledge and semantic cues. We evaluate a series of LLMs in a scenario featuring movements of geometric shapes and discover that models start to exhibit reliable reasoning on two or three variables at the 14-billion-parameter scale. However, the performance of state-of-the-art models such as GPT4o degrades below random chance as the number of variables increases. We identify and analyze several key failure modes.

Paper Type: Long

Research Area: Dialogue and Interactive Systems

Research Area Keywords: evaluation and metrics, commonsense reasoning, causality

Contribution Types: Model analysis & interpretability, Data resources

Languages Studied: English

Submission Number: 1580

Loading