Abstract: Resources underutilization is the most troublesome problem faced by almost all deep learning processors. To address this challenge, researchers have proposed various methods, especially in exploiting the parallelism of deep neural network models. However, some parallelisms have mutual effects, for example, intra-operator parallelism has adverse effects on inter-operator parallelism, resulting in a waste of resources. In order to make full use of chip resources, this paper applies the idea of elastic scheduling to the driver level of deep learning processor to achieve elastic parallelism. That is, if a task consists of multiple independent subtasks, it can be executed as long as the minimum resource requirements of the sub tasks are met. In addition, based on greedy scheduling algorithm, some improved methods are proposed, including subtask awareness, NUMA awareness, and cluster alignment awareness. Our analysis and preliminary evaluation demonstrate that, by reducing intra-operator parallelism, elastic parallelism can help improve chip utilization, and our proposed scheduling optimizations can also further improve the overall system performance.
Loading