Can LLMs Serve as Causal Inference Agents? A Study on Post-Training Methods

Can LLMs Serve as Causal Inference Agents? A Study on Post-Training Methods

ICLR 2026 Conference Submission24888 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models (LLMs), Causal Inference, Post-training

Abstract: Despite the potential of Large Language Models (LLMs) to democratize causal inference, they currently struggle with quantitative reasoning. This paper investigates whether post-training can transform an LLM into a practical and accessible causal inference agent for non-professionals. To facilitate this, we first introduce the DeepCausal dataset, a novel collection of seven computational causal inference tasks designed for both training and evaluation. We then propose DeepCausal, an LLM-based agent that enables users to perform complex causal analysis using natural language. Our core methodology involves a comprehensive comparison of online and offline post-training techniques. We find that while offline training equips LLMs with fundamental causal concepts, online post-training is crucial for teaching them how to apply these rules to solve problems, resulting in a significantly more effective, robust, and generalizable model. Our extensive experiments demonstrate that DeepCausal effectively performs causal effect estimation, providing clear, interpretable explanations in natural language. By lowering the technical barrier, our work makes complex causal analysis accessible to a broader audience and establishes the viability of using post-trained LLMs for sophisticated causal reasoning.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 24888

Loading