AdaInf: Adaptive Inference for Resource-Constrained Foundation Models

Zhuoyan Xu; Khoi Duc Nguyen; Preeti Mukherjee; Somali Chaterji; Yingyu Liang; Yin Li

AdaInf: Adaptive Inference for Resource-Constrained Foundation Models

Zhuoyan Xu, Khoi Duc Nguyen, Preeti Mukherjee, Somali Chaterji, Yingyu Liang, Yin Li

Published: 21 Jun 2024, Last Modified: 26 Jul 2024ES-FoMo-II 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Foundation model, Adaptive Inference, Efficient Adaptation

TL;DR: We propose AdaInf, a framework optimizes foundation models for resource-limited scenarios by dynamically selecting components based on input samples and MACs budgets, improving efficiency and accuracy.

Abstract: Foundation models have emerged as a powerful tool in AI, yet come with substantial computational cost, limiting their deployment in resource-constraint devices. Several recent research has been dedicated to improving the efficiency of foundation models. These prior solutions often yield models with static accuracy and latency footprint, and thus fall short in responding to potential runtime perturbations, including varying input characteristics (e.g., a static video vs.\ a dynamic one) or changing resource availability (e.g., contention due to other programs on the device). To bridge this gap, we introduce \textbf{AdaInf}---an adaptive inference framework that treats a foundation model as a collection of execution branches, and learns a scheduler to decide on which branch to execute, accounting for the input data and a compute budget. We demonstrate preliminary results on CIFAR and ImageNet with vision and vision-language models and across convolutional networks and Transformers. Our results show that AdaInf can achieve varying accuracy and latency trade-offs. When compared to latest method, AdaInf attains a major improvement in accuracy under a wide range of latency budgets.

Submission Number: 30

Loading