Towards Efficient Elastic Parallelism for Deep Learning Processor

Published: 01 Jan 2022, Last Modified: 13 May 2025ISPA/BDCloud/SocialCom/SustainCom 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Resources underutilization is the most troublesome problem faced by almost all deep learning processors. To address this challenge, researchers have proposed various methods, especially in exploiting the parallelism of deep neural network models. However, some parallelisms have mutual effects, for example, intra-operator parallelism has adverse effects on inter-operator parallelism, resulting in a waste of resources. In order to make full use of chip resources, this paper applies the idea of elastic scheduling to the driver level of deep learning processor to achieve elastic parallelism. That is, if a task consists of multiple independent subtasks, it can be executed as long as the minimum resource requirements of the sub tasks are met. In addition, based on greedy scheduling algorithm, some improved methods are proposed, including subtask awareness, NUMA awareness, and cluster alignment awareness. Our analysis and preliminary evaluation demonstrate that, by reducing intra-operator parallelism, elastic parallelism can help improve chip utilization, and our proposed scheduling optimizations can also further improve the overall system performance.
Loading