DeepSlicing: Collaborative and Adaptive CNN Inference With Low Latency

Shuai Zhang, Sheng Zhang, Zhuzhong Qian, Jie Wu, Yibo Jin, Sanglu Lu

Published: 2021, Last Modified: 29 Jan 2026IEEE Trans. Parallel Distributed Syst. 2021EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The booming of Convolutional Neural Networks (CNNs) has empowered lots of computer-vision applications. Due to its stringent requirement for computing resources, substantial research has been conducted on how to optimize its deployment and execution on resource-constrained devices. However, previous works have several weaknesses, including limited support for various CNN structures, fixed scheduling strategies, overlapped computations, high synchronization overheads, etc. In this article, we present DeepSlicing, a collaborative and adaptive inference system that adapts to various CNNs and supports customized flexible fine-grained scheduling. As a built-in functionality, DeepSlicing has supported typical CNNs including GoogLeNet, ResNet, etc. By partitioning both model and data, we also design an efficient scheduler, Proportional Synchronized Scheduler (PSS), which achieves the trade-off between computation and synchronization. Based on PyTorch, we have implemented DeepSlicing on the testbed with real-world edge settings that consists of 8 heterogeneous Raspberry Pi's. The results indicate that DeepSlicing with PSS outperforms the existing systems dramatically, e.g., the inference latency and memory footprint are reduced up to 5.79× and 14.72×, respectively.

External IDs:dblp:journals/tpds/ZhangZQWJL21