Keywords: autotuning, graph neural networks, uncertainty calibration, compiler optimization, Bayesian optimization
TL;DR: CALO-GNN is a new graph neural network cost model for TVM that provides calibrated uncertainty estimates, reducing autotuning time by over 30% across diverse hardware.
Abstract: Autotuning is a major contributor to the compilation time for deep learning models, with profiling 5,000 to 30,000 candidate schedules per operator often consuming tens of GPU-hours and delaying deployment. TVM’s Meta-Schedule addresses this by using a learned latency predictor, but current models produce only point estimates, which leads to two persistent challenges: (i) over-exploitation, where the tuner settles too early on a suboptimal schedule, and (ii) over-exploration, where it spends excessive time probing poorly modeled regions. We introduce CALO-GNN, the first evidential graph neural network cost model for TVM. CALO-GNN provides single-pass predictions of both latency and calibrated epistemic uncertainty, which enables a new uncertainty-decaying UCB rule (UEC-UCB). Our two-stage transfer approach leverages four million historical schedules to quickly adapt to new devices with just two thousand measurements. Evaluated across seven fused operators and five heterogeneous accelerators—including NVIDIA H100 and a 32-core Xeon—CALO-GNN reduces overall tuning time by 32.4% and reaches within 5% of oracle performance, 1.74 times faster than state-of-the-art baselines, all while staying within a strict 20 ms inference budget.
Submission Number: 6
Loading