Prequential Evidence Pruning: Information-Theoretic Edge Selection for Ordering-Based Causal Discovery

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Causal Discovery, Ordering-based Methods, Information Theory, Minimum Description Length, Conditional Mutual Information, Prequential Scoring, Context-aware Pruning
TL;DR: PEP is a plug‑in pruning module for ordering‑based causal discovery: it keeps an edge only if its out‑of‑sample (prequential) log‑likelihood gain exceeds a computed MDL gate, yielding consistent gains across multiple backbones.
Abstract: Ordering-based causal discovery reduces the complex problem of structure learning to parent selection given a candidate topological order. However, the pruning stage remains a critical bottleneck, as widely used procedures rely on marginal, additivity-constrained tests with manually tuned thresholds. These limitations often prevent the detection of non-additive interactions and hinder reproducibility. To address these challenges, we introduce *Prequential Evidence Pruning* (PEP), a framework that reformulates pruning as a local information-theoretic model selection problem. For each candidate edge, PEP computes the prequential log-evidence gain by evaluating the predictive density of a child node conditioned on its current co-parents using a sample-splitting strategy. An edge is retained if and only if this gain exceeds an adaptive Minimum Description Length (MDL) penalty that accounts for the sample size, the number of admissible parents, and the set size. Theoretically, we establish that the population target of the evidence gain corresponds to the Conditional Mutual Information (CMI). Furthermore, we prove that the statistic is stable under bounded log-loss regret and that prequential scoring provides finite-sample concentration guarantees. Empirically, instantiating PEP with a pre-trained tabular foundation model yields consistent improvements across diverse ordering backbones. Notably, our framework incorporates a hierarchical pruning strategy that enables scalability to higher-dimensional graphs, effectively elevating the pruning stage from marginal testing to scalable, context-aware evidence maximization.
Primary Area: causal reasoning
Submission Number: 22354
Loading