Causal AI Assistant: Facilitating Causal Data Science with Large Language Models

Causal AI Assistant: Facilitating Causal Data Science with Large Language Models

ACL ARR 2025 May Submission7702 Authors

20 May 2025 (modified: 29 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The end-to-end process of running a causal analysis requires knowledge about a wide range of estimation methods, statistical assumptions, and a technical understanding of the phenomena of interest. Recent advances in large language models (LLMs) can circumvent the need for expert knowledge by automating the inference pipeline, thereby widening the accessibility to causal inference tools. In this work, we present Causal AI Assistant (CAIA), an end-to-end pipeline for performing causal analysis. By implementing a method selection pipeline using a tree-of-thoughts-inspired approach, our pipeline leverages LLM's reasoning capabilities to select and execute appropriate inference methods to generate data-driven answers to natural language causal queries. Furthermore, we test our pipeline on preexisting datasets in addition to synthetic examples and datasets drawn from published social science studies. We show through extensive evaluation that our pipeline approach outperforms existing work in automated causal inference.

Paper Type: Long

Research Area: Language Modeling

Research Area Keywords: applications, chain-of-thought, code models, LLM/AI agents, neurosymbolic approaches, prompting

Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources

Languages Studied: English

Submission Number: 7702

Loading