Keywords: Causal Relationship Extraction, Deceptive Correlation, Multi-Agent Framework, Categorized Benchmark Dataset
Abstract: Extracting accurate causal relationships from text is crucial for developing Causal Knowledge Graphs (CKGs), which support advanced reasoning and decision-making. Traditional approaches often struggle with linguistic ambiguity and the complexity of natural language. Existing benchmarks, like SemEval-2007 Task 4, primarily feature short sentences, limiting the evaluation of modern Large Language Models (LLMs) in longer contexts.
In this study, we present two key contributions: (1) a novel Multi-Agent Causal Extraction System that employs a multistage verification process with a Judge agent for relationship extraction and a Critic agent for reasoning verification; and (2) a Categorized Benchmark Dataset containing 10,000 long-context examples across 20 causal and non-causal categories, including “deceptive correlations” to test models' capabilities.
Our experiments reveal that while our system achieves human-level performance (89.66\%) on SemEval-2007, accuracy drops to 70.00\% on our benchmark, highlighting the need for more rigorous evaluations in causal reasoning.
Paper Type: Long
Research Area: AI/LLM Agents
Research Area Keywords: Causal Relationship Extraction, Large Language Models (LLMs), Benchmark Dataset, Deceptive Correlations
Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency, Data resources
Languages Studied: English
Submission Number: 2767
Loading