Accelerate DNN Inference By Inter-Operator ParallelizationDownload PDF

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Withdrawn SubmissionReaders: Everyone
Abstract: High utilization is key to achieve high efficiency for deep neural networks. Existing deep learning frameworks has focused on improving the performance of individual operators but ignored the parallelization between operators. This leads to low device utilization especially for complex deep neural networks (DNNs) with many small operations such as Inception and NASNet. To make complex DNNs more efficient, we need to execute parallely. However, naive greedy schedule leads to much resource contention and do not yield best performance. In this work, we propose Deep Optimal Scheduling (DOS), a general dynamic programming algorithm to find optimal scheduling to improve utilization via parallel execution. Specifically, DOS optimizes the execution for given hardware and inference settings. Our experiments demonstrate that DOS consistently outperform existing deep learning library by 1.2 to 1.4 × on widely used complex DNNs.
Original Pdf: pdf
4 Replies

Loading