ASPEN: Breaking Operator Barriers for Efficient Parallelization of Deep Neural Networks

Published: 21 Sept 2023, Last Modified: 21 Nov 2023NeurIPS 2023 posterEveryoneRevisionsBibTeX
Keywords: Deep Neural Network, Deep Learning, Parallel Execution Algorithm, Parallelization, Deep Learning Parallelism, Dynamic, Asynchronous, Scheduling, Dynamic Scheduling, Dynamic Execution, tile, tiling, dataflow, dataflow graph, tile-based dataflow graph, opportunistic parallelism
TL;DR: ASPEN uncovers a novel source of parallelism in DNNs by breaking the synchronization barriers in operators and leverages them by letting each parallel resource dynamically locate and execute the fine-grained parallel computation opportunities.
Abstract: Modern Deep Neural Network (DNN) frameworks use tensor operators as the main building blocks of DNNs. However, we observe that operator-based construction of DNNs incurs significant drawbacks in parallelism in the form of synchronization barriers. Synchronization barriers of operators confine the scope of parallel computation to each operator and obscure the rich parallel computation opportunities that exist across operators. To this end, we present ASPEN, a novel parallel computation solution for DNNs that achieves fine-grained dynamic execution of DNNs, which (1) removes the operator barriers and expresses DNNs in dataflow graphs of fine-grained tiles to expose the parallel computation opportunities across operators, and (2) exploits these opportunities by dynamically locating and scheduling them in runtime. This novel approach of ASPEN enables opportunistic parallelism, a new class of parallelism for DNNs that is unavailable in the existing operator-based approaches. ASPEN also achieves high resource utilization and memory reuse by letting each resource asynchronously traverse depthwise in the DNN graph to its full computing potential. We provide challenges and solutions to our approach and show that our proof-of-concept implementation of ASPEN on CPU shows exceptional performance, outperforming state-of-the-art inference systems of TorchScript and TVM by up to 3.2$\times$ and 4.3$\times$, respectively.
Supplementary Material: zip
Submission Number: 1807
Loading