Abstract: With deep neural networks (DNNs) increasingly deployed on edge devices, hardware (HW)-aware optimization techniques—such as HW-aware compression and HW-aware neural architecture search (HW-NAS)—have become essential. These methods rely on real feedback from the target hardware to tailor DNN architectures for efficient deployment. While the search can be parallelized, latency measurements via hardware-in-the-loop (HIL) remain a bottleneck due to their sequential nature. Recent approaches use latency predictors to replace costly HIL feedback, but challenges persist: (1) platform-specific predictors often require tens of thousands of samples, and (2) inaccurate predictions can mislead the NAS process. To address this, we introduce HiFi-LLP, a high-fidelity, low-cost latency predictor based on graph attention networks, augmented with a confidence metric. HiFi-LLP outperforms prior platform-specific predictors by up to 9 percentage points (p.p.) in the 10% accuracy bound and achieves a Spearman’s rank correlation of up to 0.996 across six devices in the LatBench dataset. We further propose a hybrid NAS framework that routes low-confidence predictions to HIL, achieving up to 8.6× speedup compared to typical NAS while maintaining a competitive Pareto front. Code is available at https://github.com/shamvbs/HiFi-LLP. 1
External IDs:dblp:conf/socc/SampathSFTFFMVFS25
Loading