Efficient Online DNN Inference with Continuous Learning in Edge Computing

Yifan Zeng; Ruiting Zhou; Lei Jiao; Ziyi Han; Jieling Yu; Yue Ma

Efficient Online DNN Inference with Continuous Learning in Edge Computing

Yifan Zeng, Ruiting Zhou, Lei Jiao, Ziyi Han, Jieling Yu, Yue Ma

Published: 01 Jan 2024, Last Modified: 06 Feb 2025IWQoS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Compressed edge DNN models usually experience decreasing model accuracy when performing inference due to data drift. To maintain the inference accuracy, retraining models with continuous learning is usually employed in the edge. However, online edge DNN inference with continuous learning faces new challenges. First, introducing retraining jobs leads to resource competition with the existing edge inference tasks, which will affect the inference latency. Second, retraining jobs and inference tasks exhibit significant differences in workload and latency requirements. These two jobs cannot adopt the same scheduling policy. To overcome the challenges, we propose an Online scheduling algorithm for INference with Continuous learning (OINC). OINC minimizes the weighted sum of the latency of inference tasks and the completion time of retraining jobs with limited edge resources, while ensuring the satisfaction of the inference task’s service level objective (SLO) and meeting the deadlines of retraining jobs. OINC first reserves a portion of resources to complete all current inference tasks and allocates the remaining resources to retraining jobs. Subsequently, based on the reserved resource ratio, OINC invokes two sub-algorithms to select edges and allocate resources for each inference task and retraining job respectively. Compared with six state-of-the-art algorithms, OINC can reduce the weighted sum by up to 23.7%, and increase the success rate by up to 35.6%.

Loading