PimShare: Scheduling for Multi-DNN Inference on Processing-in-memory Accelerated Edge Server

Xinyu Chen, Zhongle Xie, Huan Li, Ke Chen, Lidan Shou, Dawei Jiang, Gang Chen

Published: 01 Jan 2025, Last Modified: 04 Feb 2026IEEE Transactions on Computer-Aided Design of Integrated Circuits and SystemsEveryoneRevisionsCC BY-SA 4.0
Abstract: Deep neural network (DNN) models are crucial for Internet-of-Things (IoT) applications. In a multi-access edge computing (MEC) system, IoT devices dispatch their requests to a nearby edge server to accelerate DNN inference. Recently, processing-in-memory (PIM) accelerators have emerged for DNN inference due to their power and area efficiency. However, current scheduling methods primarily focus on single-DNN workloads and do not fully utilize PIM’s potential for managing multi-DNN workloads in MEC scenarios. Additionally, PIM accelerators face challenges such as inflexible resource allocation and costly write operations, constraining the direct application of GPU scheduling techniques. In this paper, we define the multi-DNN scheduling problem on the PIM accelerator. To reduce the problem’s complexity, we propose a heuristic optimization algorithm, called Leader-Follower. Using this algorithm, we present PimShare, a scheduler for the PIM accelerator to process multi-DNN workloads. PimShare schedules multi-DNN inference requests to enable both temporal and spatial multiplexing, which can enhance hardware utilization. Compared with baseline scheduling methods, including those adapted from GPU scheduling, PimShare can achieve up to a two-order-of-magnitude improvement in throughput and scale the high throughput to workloads containing 18 DNN models. In addition, PimShare reduces write operations and processing latency.
Loading