Multi-Agent Causal Discovery Using Large Language Models

Multi-Agent Causal Discovery Using Large Language Models

ACL ARR 2026 May Submission15965 Authors

26 May 2026 (modified: 08 Jun 2026)ACL ARR 2026 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Causal Discovery; LLM-Agent; LLMs

Abstract: Causal discovery aims to identify causal relationships between variables and is a fundamental problem across the sciences. Traditional statistical causal discovery (SCD) methods rely solely on observational data and ignore the contextual information available in metadata, whereas recent LLM-based methods exploit metadata but treat the large language model (LLM) as a single agent, leaving its judgments vulnerable to memorized or biased associations. To address this gap, we introduce MAC (MultiAgent Causal Discovery Framework), which, to our knowledge, is the first framework to cast causal discovery as a multi-agent debate coupled with the autonomous selection of an SCD algorithm. MAC operates in two stages connected by a Meta Fusion mechanism. The Debate-Coding Module (DCM) first debates and selects the SCD algorithm best suited to the data and executes it to obtain a data-grounded initial graph; Meta Fusion then converts this graph into textual causal constraints. The Meta-Debate Module (MDM) refines the graph through an adversarial Affirmative–Negative– Judge debate that adjudicates competing causal hypotheses against the combined metadata. Across five benchmark datasets and three metrics (F1, SHD, NHD), MAC achieves the best aggregate performance among five statistical and four LLM-based baselines, ranking first on 10 of 15 evaluation points with Gemini2.0-Flash—including a perfect reconstruction of the Earthquake graph—and remains robust across three backbone LLMs.

Paper Type: Long

Research Area: LLM agents

Research Area Keywords: multi-agent systems, agent coordination and negotiation, agent communication, planning in agents, LLM-based controllers, agent evaluation, tool use

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-compute settings (efficiency)

Languages Studied: English

EMNLP 2026 AI Reviewing Experiment: yes

Submission Number: 15965

Loading