In-Context Learning for Latency Estimation

Timur Carstensen; Thomas Elsken; Martin Rapp

In-Context Learning for Latency Estimation

Timur Carstensen, Thomas Elsken, Martin Rapp

Published: 12 Jul 2024, Last Modified: 21 Aug 2024AutoML 2024 WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Keywords: In-context Learning, Latency estimation, Neural Architecture Search (NAS), Prior-Data Fitted Networks, Hardware-aware NAS

TL;DR: In-context learning for latency estimation improving the quality of predictions and sample efficiency

Abstract: Neural architectures must be computationally efficient for edge device deployment to perform well under hardware constraints. Current Hardware-Aware NAS (HW-NAS) methods use surrogate models to predict hardware metrics (e.g., latency) during architecture search. These surrogate models typically require large amounts of data to train or finetune and rely on search-space specific encodings and meta-learning. We propose an In-Context Learning-based method for hardware latency estimation that generalises to unseen hardware in a single forward pass with a few labelled samples. Our surrogate is trained on real architecture-latency pairs with data augmentation to improve sample efficiency. Our method surpasses the state-of-the-art in Spearman's $\rho$ and is up to 72\% to 92\% more sample efficient.

Submission Checklist: Yes

Broader Impact Statement: Yes

Paper Availability And License: Yes

Code Of Conduct: Yes

Optional Meta-Data For Green-AutoML: All questions below on environmental impact are optional.

CPU Hours: 360

GPU Hours: 360

TPU Hours: 0

Evaluation Metrics: No

Submission Number: 30

Loading