Critical Percolation as a Synthetic Data Model for Interpretability

Aryeh Brill; Tom Ingebretsen Carlson

Critical Percolation as a Synthetic Data Model for Interpretability

Aryeh Brill, Tom Ingebretsen Carlson

Published: 11 Jun 2026, Last Modified: 11 Jun 2026Mech Interp Workshop ICML 2026 VirtualposterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Benchmarking Interpretability

TL;DR: We introduce a synthetic dataset for interpretability based on critical mean-field percolation, with sparse power-law clusters and taxonomic hierarchy, and propose an almost linear-time algorithm to jointly sample a cluster and its latent hierarchy.

Abstract: Neural networks learn features that reflect the hierarchical, multi-scale structure of natural data. Synthetic datasets used to evaluate interpretability methods typically lack this structure, limiting their value as realistic toy models. To close this gap, we introduce a family of synthetic datasets consisting of hierarchical functions defined on critical mean-field percolation clusters embedded in a high-dimensional data space. The percolation data consists of sparse, low-dimensional fractal clusters with a power-law size distribution. Latent variables modeling a taxonomic hierarchy generate each data point's target value. The data model is analytically tractable with known critical exponents that fix its properties without requiring hyperparameter tuning. We leverage a mapping between percolation clusters, random trees, and additive coalescence to propose an almost linear-time algorithm to jointly sample a random tree and its hierarchical latent decomposition, enabling data generation at arbitrary scale. Using probing experiments, we find that the model's ground-truth latent variables can be linearly decoded from neural network activations. Together, sparsity, self-similarity, power-law statistics, and analytical tractability make critical percolation a principled testbed for interpretability research.

Submission Number: 342

Loading