Abstract: Modern Neural Architecture Search (NAS) focuses on finding the best performing architectures in hardware-aware settings; e.g., those with an optimal tradeoff of accuracy and latency. Due to many advantages of prediction models over live measurements, the search process is often guided by estimates of how well each considered network architecture performs on the desired metrics. Typical prediction models range from operation-wise lookup tables over gradient-boosted trees and neural networks, with little known information on how they compare. We evaluate 18 different performance predictors on ten combinations of metrics, devices, network types, and training tasks, and find that MLP models are the most promising. We then simulate and evaluate how the guidance of such prediction models affects the subsequent architecture selection. Due to inaccurate predictions, the selected architectures are generally suboptimal, which we quantify as an expected reduction in accuracy and hypervolume. We show that simply verifying the predictions of just the selected architectures can lead to substantially improved results. Under a time budget, we find it preferable to use a fast and inaccurate prediction model over accurate but slow live measurements.
Keywords: Neural Architecture Search, Neural Networks, Artificial Intelligence
One-sentence Summary: We study which predictors for hardware metrics (latency, ...) are best in the context of Neural Architecture Search, and how they affect it.
Track: Main track
Reproducibility Checklist: Yes
Broader Impact Statement: Yes
Paper Availability And License: Yes
Code Of Conduct: Yes
Reviewers: Kevin Laube, firstname.lastname@example.org
Main Paper And Supplementary Material: pdf
Code And Dataset Supplement: zip
Steps For Environmental Footprint Reduction During Development: In preliminary tests, we noticed that most predictor methods train equally fast on CPUs and GPUs. We thus did not use any GPUs in our evaluation. We further limit the evaluation of most predictors to a selection of non-redundant and non-trivial datasets, but show that even linear regression is capable of easily solving most others. The number of CPU hours is obtained by considering the total amount of CPU seconds (using only 2/24 cores, thus divided by 12) and a guessed buffer of ~25h that we needed for a much cheaper simulation. Note that the stated time does not include the costs of acquiring the hardware metrics (latency, energy consumption, ...) on the different devices.
CPU Hours: 1200
GPU Hours: 0
TPU Hours: 0
Evaluation Metrics: No
Class Of Approaches: This basic study considers the upper bound of perfect knowledge (ground truth)
Datasets And Benchmarks: HW-NAS-Bench, TransNAS-Bench-101
Performance Metrics: Kendall's Tau, Spearman, Pearson
Benchmark Time: HW-NAS-Bench, 44.1 CPU days TransNAS-Bench-101, 4.8 CPU days