Content-aware Input Scaling and Deep Learning Computation Offloading for Low-Latency Embedded Vision

Omkar Prabhune, Tianen Chen, Younghyun Kim

Published: 2024, Last Modified: 07 Mar 2025CVPR Workshops 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Deploying deep learning (DL) models for visual recognition on embedded systems is often constrained by their limited compute power and storage capacity, and has stringent latency and power requirements. As emerging DL applications continue to evolve, they place increasing demands on computational resources that embedded vision systems are unable to provision. One promising solution to overcome these limitations is computation offloading. However, for performance improvements to be realized, it is essential to carefully partition tasks, taking into account both the quality of the data and the communication overhead.In this paper, we introduce a novel framework for content-aware offloading of DL computations, aimed at maximizing quality-of-service while adhering to latency constraints. Our proposed framework involves the embedded vision system/edge device intelligently compressing data in a content-aware manner using a lightweight model and transmitting it to a more powerful server. The framework consists of two key components: offline training for efficient content-aware data scaling and online control that adapts to the network variations in real-time. To illustrate the effectiveness of our approach, we apply it to multiple downstream tasks such as face identification, person keypoint detection, and instance segmentation, showcasing a significant enhancement in the overall quality of results for various applications.