HeteroPipe: Pipelining Multi-DNN Inference on Heterogeneous Mobile Processors under Co-Execution Slowdown

Yuhao Shen, Zichen Wang, Tianyi Wang, Chaojie Gu, Zhenyu Wen, Yuanchao Shu, Cong Wang

Published: 2025, Last Modified: 05 Jan 2026ICDCS 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The emerging multi-modal applications exemplified by multi-DNN inference have renewed interests for mobile intelligence. The goal is to utilize heterogeneous processors on-board to maximize throughput and resource utilization. Among a variety of options, building model-parallel pipelines across different processors is a promising way. However, the existing efforts either focus on optimizing homogeneous DNN executions or simply ignore co-execution slowdown on the shared memory bus. Based on extensive empirical studies and insights with various degrees of resource contention, in this work, we introduce Hetero2Pipe, a two-step pipeline planner based on dynamic programming, and contention-mitigated pipeline bubble minimization to make the problem tractable within manageable search space. The extensive evaluation across three commercial SoCs demonstrates 2-8× speedup compared to the state-of-the-art schemes.

External IDs:dblp:conf/icdcs/ShenWWGWSW25