Abstract: Hypergraph neural networks (HGNNs) capture multi-way interactions but their quadratic cost limits scalability. Existing pruning methods rely on hand-crafted schedules and fixed per-level thresholds that require per-dataset tuning and treat structural pruning, contrastive learning, and loss balancing as independent objectives. We propose \textbf{TriPrune-HGNN}, an adaptive hypergraph pruning framework that replaces these hand-crafted components with four small learnable controllers: a compressibility predictor that estimates the achievable retention ratio from graph-level statistics before training; hierarchical soft gates with learned thresholds operating jointly over components, hyperedges, and nodes; attention-based contrastive mining that identifies false and hard negatives induced by topology change; and meta-learned loss balancing. Pruning is realised as a density curriculum so the model trains at its final inference sparsity. Our principal claim is a favourable accuracy--efficiency \emph{operating point}, not per-metric dominance. Against the strongest efficient baselines (AdaGLT, SHARP-Distill, Shaver) under matched tuning budgets on five heterogeneous hypergraph benchmarks, MAE improvements are small but consistent in sign: $0.004$--$0.008$ at fixed cross-dataset hyperparameters, narrowing to $0.002$--$0.004$ when the baselines are re-tuned per-dataset. The larger headline numbers ($75.5\%$ inference-time reduction, $69.7\%$ memory reduction) are computed against the unpruned HGNN baseline (Table~\ref{tab:main}, Avg rows) and are reported for context, not as the primary contribution. We additionally test out-of-domain generalisation on three real benchmarks (NTU2012, ModelNet40, House) spanning 3D-shape and political-voting hypergraph domains: the accuracy advantage, large and significant in-support, becomes small and statistically indistinguishable from zero on all three out-of-domain benchmarks ($+0.6$/$+0.3$/$+0.1$ pp, all $p>0.1$) as the deployment distribution leaves the compressibility predictor's training support, while the efficiency gain transfers cleanly across all three --- a useful diagnostic separation between the framework's transferable (efficiency) and non-transferable (accuracy) components. Three short design-motivation propositions justify the choices of gate-smoothing $\epsilon$, finite-difference perturbation $\delta$, and the three-level pruning decomposition, and a fourth proves \emph{local approximate stationarity} of the bi-level meta-update under empirically verified assumptions; they are local justifications under stated regularity conditions, not end-to-end guarantees. Code, weights, the synthetic generator, and seed-level numbers are released for reproducibility.
Submission Type: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=b0BLuYYJlc&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DTMLR%2FAuthors%23your-submissions)
Changes Since Last Submission: ---
This is a substantial revision rather than a polish. The current version differs from the previous one in seven concrete ways.
**1. New conceptual component — Hypergraph Compressibility Predictor (HCP).** The previous version contained no mechanism for predicting compressibility before training; pruning began from arbitrary initial thresholds. The revised paper adds a small MLP that maps ten graph-level structural statistics (density, spectral gap, degree-distribution skewness, clustering coefficient, hyperedge-size entropy, etc.) to a domain-invariant compressibility prior r⋆ ∈ (0,1). HCP is pre-trained once on a 5,000-graph synthetic corpus and applied frozen at deployment. It supplies both the initialisation and the regularisation signal for the threshold controllers and drives the cross-dataset transfer reported in Section 5.2. A new appendix validates HCP along four axes (synthetic accuracy, zero-shot real-benchmark transfer, feature importance, distribution shift). An informal Proposition 4 provides a transfer bound.
**2. Density-curriculum interpretation.** The previous version did not address whether progressive pruning trains the model on a different graph distribution than deployment. New Section 3.5 and Section 5.5 show that retention ratios stabilise by epoch ≈150 and the model trains at its final inference sparsity for the last ~25% of training, eliminating train–inference shift. A 4×4 train-density × test-density grid confirms the model operates on the diagonal. A new appendix adds a one-shot-prune-plus-fine-tune baseline alongside the cold-start baseline.
**3. Recalibrated contribution claims.** The previous abstract claimed "state-of-the-art performance across all 15 evaluation metrics." The revised abstract, introduction, and conclusion frame the contribution as a *favourable accuracy–efficiency operating point* rather than per-metric dominance. Actual MAE gaps against the strongest efficient baselines are 0.002–0.006, narrowing to 0.002–0.004 under per-dataset retuning. The 71.8%/69.7% headline figures are now explicitly stated to be measured against the unpruned baseline and reported for context only.
**4. Honest statistical reporting.** The revised significance appendix reports paired t-tests across all 210 method–dataset–metric cells (14 baselines × 5 datasets × 3 metrics) with both Bonferroni and Benjamini–Hochberg corrections, plus a per-baseline Cohen's *d* effect-size table. Three cells that fail p < 0.05 are explicitly flagged in-text (vs. AdaGLT on IMDB MAE; vs. SHARP-Distill on DBLP MAE and Amazon MAE), with no statistical claim made on those cells. The Cohen's *d* values on those cells (0.15–0.18) are consistent with the t-test outcomes.
**5. Two new robustness experiments.** (i) Per-dataset hyperparameter retuning (new Section 5.9) re-runs the three strongest baselines under independent per-dataset optimisation, showing the headline advantage is not an artefact of single-source tuning. (ii) Out-of-domain evaluation on NTU2012 3D shape recognition (new Section 5.10) tests generalisation beyond recommendation/citation hypergraphs. The accuracy gap shrinks to 0.6% (p ≈ 0.13, not statistically distinguishable) while efficiency transfers cleanly, providing a diagnostic separation between transferable (efficiency) and non-transferable (accuracy) components.
**6. Theory rescoped as design-motivation, not end-to-end guarantees.** The previous version contained three theorems with global guarantees relying on μ-strong convexity and a VC-dimension argument over the loss-weight simplex. These assumptions are not well-calibrated to deep non-convex training (strong convexity does not hold; the hypothesis class depends on both λ and Θ, so VC dimension is not bounded by dim(λ)). The revised Appendix A drops both. Three propositions are retained, explicitly framed as informal design-motivation lemmas under local Lipschitz assumptions (STE bias bound, finite-difference truncation/evaluation trade-off, hierarchical-pruning advantage under diminishing returns). A prominent scope-and-limitations box opens the appendix.
**7. Additional smaller changes.** A sampling-baselines orthogonality probe (Section 5.4: AllSet, GraphSAINT-adapted, UniGNN) anchors the position that sampling and pruning act on different axes. A uniform ~25-configuration tuning budget is documented for all 14 baselines. An adaptation protocol is provided for the two cross-domain pruning baselines marked ‡. Code, pre-trained HCP weights, the synthetic generator, cross-domain adaptation code, and seed-by-seed numbers are released as anonymised supplementary material.
The net effect is a paper with a narrower, better-calibrated headline claim, a new mechanism (HCP) absent from the previous version, substantially more robustness evidence, and a theoretical section honestly scoped to local design justifications.
---
Assigned Action Editor: ~Christopher_Morris1
Submission Number: 9018
Loading