GUIDE : Generalized-Prior and Data Encoders for DAG Estimation

GUIDE : Generalized-Prior and Data Encoders for DAG Estimation

ICLR 2026 Conference Submission13091 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Casual Discovery, LLM, Reasoning

Abstract: Modern causal discovery methods face critical limitations in scalability, computational efficiency, and adaptability to mixed data types, as evidenced by benchmarks on node scalability (30, $\leq50$, $\geq70$ nodes), computational energy demands, and continuous/non-continuous data handling. While traditional algorithms like PC, GES, and ICA-LiNGAM struggle with these challenges, exhibiting prohibitive energy costs for higher-order nodes and poor scalability beyond 70 nodes, we propose GUIDE, a framework that integrates Large Language Model (LLM)-generated adjacency matrices with observational data through a dual-encoder architecture. GUIDE uniquely optimizes computational efficiency, reducing runtime on an average by $\approx$ 42\% compared to RL-BIC and KCRL methods, while achieving an average $\approx$ 117\% improvement in accuracy over both NOTEARS and GraN-DAG individually. During training, GUIDE’s reinforcement learning agent dynamically balances reward maximization (accuracy) and penalty avoidance (DAG constraints), enabling robust performance across mixed data types and scalability to $\geq70$ nodes—a setting where baseline methods fail.

Primary Area: causal reasoning

Submission Number: 13091

Loading