Operational Feature Fingerprints of Graph Datasets via a White-Box Signal-Subspace Probe

29 Apr 2026 (modified: 01 May 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Graph neural networks achieve strong node-classification performance, but learned message passing entangles ego features, neighborhood smoothing, high-pass graph differences, class geometry, and classifier-boundary effects inside opaque representations. This makes it difficult to determine why nodes are classified as they are, and which graph-learning mechanisms are useful, harmful, or necessary for a given dataset. We propose WG-SRC (White-box Graph Signal–Subspace Residual Classifier), a white-box signal-subspace probe for prediction and graph dataset diagnosis. WG-SRC replaces learned message passing with an explicit, named graph-signal dictionary containing raw features, row- and symmetric-normalized low-pass propagation, and high-pass graph differences. It then combines Fisher coordinate selection, class-wise PCA subspaces, closed-form multi-α ridge classification, and validation- based score fusion. Because every signal block and decision module is explicit, the fitted scaffold produces both predictions and an operational fingerprint over raw-feature, low-pass, high-pass, class-geometric, and ridge-boundary mechanisms. Across six node-classification datasets, WG-SRC remains competitive with aligned reproduced baselines and achieves a positive average gain under matched repeated splits. Its fingerprints distinguish low-pass- dominated Amazon graphs, mixed high-pass and class-geometrically complex Chameleon behavior, and raw- or boundary-sensitive WebKB graphs. Aligned interventions further show that these fingerprints are operational: they identify when high-pass blocks behave like removable noise, when graph-derived or raw signals should be preserved, and when ridge-type boundary correction matters. Additional fixed black-box component probes further show that measured dataset fingerprints organize architectural behavior across multiple black-box families: different measured dataset conditions repeatedly favor different inductive biases. Thus, WG-SRC serves both as a functioning white-box classifier and as a dataset- fingerprinting probe, enabling fingerprint-conditioned analysis of how black-box graph-model components behave under different measured dataset conditions.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Devendra_Singh_Dhami1
Submission Number: 8665
Loading