Abstract: In this work, we propose Layer-Puzzle, a multi-task allocation and scheduling framework for multi-core NPUs. Based on the proposed latency-prediction model and dynamic parallelization scheme, Layer-Puzzle can generate near-optimal results for each layer under given hardware resources and traffic congestion levels. As an online scheduler, Layer-Puzzle performs a QoS-aware and dynamic scheduling method that picks the superior version from the previously compiled results and co-runs the selected tasks to improve system performance. Our experiments on MLPerf show that Layer-Puzzle can achieve up to 1.61X, 1.53X, and 1.95X improvement in ANTT, STP, and PE utilization, respectively.
Loading