Competition is the key: A Game Theoretic Causal Discovery Approach

Amartya Roy; Souvik Chakraborty

Competition is the key: A Game Theoretic Causal Discovery Approach

Amartya Roy, Souvik Chakraborty

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Causal Discovery, Reinforcement Learning

Abstract: We introduce a \textbf{game-theoretic reinforcement learning framework} for causal discovery in which a DDQN agent \emph{competes} with a strong baseline (GES or GraN-DAG) and always \emph{warm-starts} from the opponent’s graph. This yields three key guarantees: the learned graph is \emph{never worse} than the warm start, warm-starting \emph{accelerates convergence}, and with high probability the method selects the best candidate when $n$ is large enough. Formally, if $ n \ge \tfrac{8L^2}{\Delta_n^2}\log\bigl(\tfrac{2|C|}{\delta}\bigr), $ then with probability $1-\delta$ the algorithm recovers the population-optimal graph. Here $L$ is a Lipschitz constant of the score function, $\Delta_n$ is the empirical gap between the best and second-best candidate scores, $|C|$ is the number of candidate graphs considered, and $ \forall\delta \in (0,1)$ is the failure probability. To our knowledge, this is the first finite-sample consistency result for an RL-based causal discovery method. Empirically, DDQN-CD matches or outperforms GES and GraN-DAG on standard benchmarks (Sachs, Asia, Alarm, Child, Hepar2) and scales to large graphs (Dream $\sim$100, Andes $\sim$220 nodes). Our results demonstrate that RL-based discovery can be simultaneously \emph{provably safe}, \emph{sample-efficient}, and \emph{scalable}, helping bridge the gap between theoretical guarantees and practical performance.

Primary Area: causal reasoning

Submission Number: 7436

Loading