HeteroPipe: Pipelining Multi-DNN Inference on Heterogeneous Mobile Processors under Co-Execution Slowdown
Abstract: The emerging multi-modal applications exemplified by multi-DNN inference have renewed interests for mobile intelligence. The goal is to utilize heterogeneous processors on-board to maximize throughput and resource utilization. Among a variety of options, building model-parallel pipelines across different processors is a promising way. However, the existing efforts either focus on optimizing homogeneous DNN executions or simply ignore co-execution slowdown on the shared memory bus. Based on extensive empirical studies and insights with various degrees of resource contention, in this work, we introduce Hetero2Pipe, a two-step pipeline planner based on dynamic programming, and contention-mitigated pipeline bubble minimization to make the problem tractable within manageable search space. The extensive evaluation across three commercial SoCs demonstrates 2-8× speedup compared to the state-of-the-art schemes.
External IDs:dblp:conf/icdcs/ShenWWGWSW25
Loading