CausalARC: Abstract Reasoning with Causal World Models

Published: 23 Sept 2025, Last Modified: 22 Nov 2025LAWEveryoneRevisionsBibTeXCC BY-NC 4.0
Keywords: world models, language models, causal reasoning, abstract reasoning, logical reasoning, causal discovery, program synthesis, reasoning evaluation
TL;DR: CausalARC provides an experimental testbed for reasoning under distribution shift, with tasks sampled from fully specified causal world models.
Abstract: On-the-fly reasoning often requires adaptation to novel problems under limited data and distribution shift. This work introduces CausalARC: an experimental testbed for AI reasoning in low-data and out-of-distribution regimes, modeled after the Abstraction and Reasoning Corpus (ARC). Each CausalARC reasoning task is sampled from a fully specified causal world model, formally expressed as a structural causal model. Principled data augmentations provide observational, interventional, and counterfactual feedback about the world model in the form of few-shot, in-context learning demonstrations. As a proof-of-concept, we illustrate the use of CausalARC for four language model evaluation settings: (1) abstract reasoning with test-time training, (2) counterfactual reasoning with in-context learning, (3) program synthesis, and (4) causal discovery with logical reasoning. Within- and between-model performance varied heavily across tasks, indicating room for significant improvement in language model reasoning.
Submission Type: Benchmark Paper (4-9 Pages)
Submission Number: 74
Loading