Abstract: Deep neural network (DNN) models are crucial for Internet-of-Things (IoT) applications. In a multi-access edge computing (MEC) system, IoT devices dispatch their requests to a nearby edge server to accelerate DNN inference. Recently, processing-in-memory (PIM) accelerators have emerged for DNN inference due to their power and area efficiency. However, current scheduling methods primarily focus on single-DNN workloads and do not fully utilize PIM’s potential for managing multi-DNN workloads in MEC scenarios. Additionally, PIM accelerators face challenges such as inflexible resource allocation and costly write operations, constraining the direct application of GPU scheduling techniques. In this paper, we define the multi-DNN scheduling problem on the PIM accelerator. To reduce the problem’s complexity, we propose a heuristic optimization algorithm, called Leader-Follower. Using this algorithm, we present PimShare, a scheduler for the PIM accelerator to process multi-DNN workloads. PimShare schedules multi-DNN inference requests to enable both temporal and spatial multiplexing, which can enhance hardware utilization. Compared with baseline scheduling methods, including those adapted from GPU scheduling, PimShare can achieve up to a two-order-of-magnitude improvement in throughput and scale the high throughput to workloads containing 18 DNN models. In addition, PimShare reduces write operations and processing latency.
External IDs:doi:10.1109/tcad.2025.3641876
Loading