Neural Concept Verifier: Scaling Prover-Verifier Games via Concept Encodings

ICLR 2026 Conference Submission18469 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Interpretability, Prover-Verifier Games, Concept Bottleneck Models, Concept Explanation, XAI
TL;DR: We propose the Neural Concept Verifier (NCV), a new framework combining Prover-Verifier Games and concept-level encodings, enabling interpretable and nonlinear classification for high-dimensional data.
Abstract: While *Prover-Verifier Games* (PVGs) offer a promising and much needed path toward verifiability in nonlinear classification models, they have not yet been applied to complex inputs such as high-dimensional images. Conversely, *Concept Bottleneck Models* (CBMs) effectively translate such data into interpretable concepts but are limited by their reliance on low-capacity linear predictors. In this work, we push towards real-world verifiability by combining the strengths of both approaches. We introduce *Neural Concept Verifier (NCV)*, a unified framework combining PVGs for formal verifiability with concept encodings to handle complex, high-dimensional inputs in an interpretable way. NCV achieves this by utilizing recent minimally supervised concept discovery models to extract structured concept encodings from raw inputs. A *prover* then selects a subset of these encodings, which a *verifier*, implemented as a nonlinear predictor, uses exclusively for decision-making. Our evaluations show that NCV outperforms CBM and pixel-based PVG classifier baselines on high-dimensional, logically complex datasets and also helps mitigate shortcut behavior. Overall, we demonstrate NCV as a promising step toward performative, verifiable AI.
Primary Area: interpretability and explainable AI
Submission Number: 18469
Loading