Abstract: Neural network performance predictors are widely used to accelerate neural architecture search, but existing methods face a persistent trade-off: learning-based predictors require costly per-dataset initialization, while lightweight proxies are fast yet struggle to exploit prior experience and often degrade under dataset shift. We introduce NAP2, a hybrid performance predictor that models early training dynamics. NAP2 tracks the temporal evolution of layer-wise weight and gradient statistics over a small number of mini-batches, producing accurate rankings from as little as 100 mini-batches per candidate. Crucially, NAP2 supports cross-dataset reuse: a predictor trained on one dataset can be applied to another without fine-tuning, avoiding the re-initialization overhead incurred by many model-based approaches. Experiments on NAS-Bench-201 across CIFAR-10, CIFAR-100, and ImageNet16-120 show that NAP2 is competitive with strong hybrid baselines under limited budgets and delivers cost-effective cross-dataset transfer, outperforming established learning-curve and zero-cost baselines at short query times. We further demonstrate robustness to significant distribution shift, with a predictor trained on CIFAR-10 transferring effectively to SVHN. Our code and trained models are available at https://anonymous.4open.science/r/NAP2-6027/README.md.
Submission Type: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: **1) Predictor architecture refinement.** Replaced the single-layer LSTM (h=2048) with a 2-layer bidirectional GRU (h=128) with dual-path attention pooling — 29× fewer parameters and improved τ across all reported configurations. Tables 1–10 updated; pipeline, protocol, and conclusions unchanged.
**2) Sharpened framing in Abstract and Section 1.** Baselines named upfront; research gap stated explicitly; "limited budgets" (≤100 mini-batches, ~9 s on a GTX 1080 Ti) and "cost-effective transfer" given operational definitions; the three evaluation axes (init cost, query cost, ranking accuracy) named explicitly.
**3. Cleaned-up cross-dataset transfer protocol.** Removed the legacy target-side min-max normalization (an artifact of an earlier predictor version); pipeline now uses raw accuracies in [0, 1] with no target-side information consumed. New 3-step protocol box added to Section 4.
**4. New Appendix I + Section 6.3 — Optimizer robustness (AdamW).** SGD-trained NAP2 applied zero-shot to AdamW-trained architectures recovers ~91% of the SGD↔AdamW reference correlation (τ = 0.665), with a mechanistic analysis of weight vs gradient feature shifts and a gradient-only deployment recipe for cross-optimizer transfer.
**5. New Appendix J + Section 6.3 — Cross-search-space transfer (NB-201 ↔ NB-101).** Spans a 31× parameter range and a different cell topology; includes a pre-declared size-shortcut control sub-sample, a DeepSets encoder ablation, and bidirectional verification (NB-101 → NB-201 mean τ = 0.564). Headline cross-space ρ exceeds the strongest published training-free proxy on NB-101 CIFAR-10.
**6. New Appendix H — Ablations isolating the design choices.** Weights-only vs gradients-only vs both; ordered BiGRU vs static-MLP vs mean-pool vs shuffled; and feature-family ablation (norm / distribution / general). Expanded Appendix C.4 documents the held-out τ search over hidden dim, loss, schedule, augmentation, and pooling, plus structural design rationale.
**7. Section 4.2 rewritten and Section 3 reorganized.** Baselines now grouped by family (hybrid / learning-curve / zero-cost) with an explicit "why these baselines" plus conceptual distinctions from NAP2 for each family. Section 3 gets a four-stage roadmap, motivation for "meta-features," explicit "weights vs gradients" labeling, and a notation summary (Fig 1, App A/B).
**8. Tables/figures + closing sections.** Tables 4–6 use bold-best / underline-second-best with one-line takeaway captions and a win-count summary in Section 5.2; consistent fonts, std notation, and decimal precision across table families; new "Scope and limitations" paragraph and a brief Broader Impact paragraph added to Section 7.
Assigned Action Editor: ~Vasileios_Belagiannis1
Submission Number: 7645
Loading