Abstract: The widespread popularity of deep neural networks (DNNs) has made it an important workload in modern datacenters. Training DNNs is both computation-intensive and memory-intensive. While prior works focus on training parallelization (e.g., data parallelism and model parallelism) and model compression schemes (e.g., pruning and quantization) to reduce the training time, choosing an appropriate data layout for input feature maps also plays an important role and is considered to be orthogonal to parallelization and compression in delivering the overall training performance. However, finding an optimal data layout is non-trivial since the preferred data layout varies depending on different DNN models as well as different pruning schemes that are applied. In this paper, we propose a simple-yet-effective data layout arbitration framework that automatically picks up the beneficial data layout for different DNNs under different pruning schemes. The proposed framework is built upon a formulated cache estimation model. Experimental results indicate that our approach is always able to select the most beneficial data layout and achieves the average training performance improvement with 14.3% and 3.1% compared to uniformly using two popular data layouts.
0 Replies
Loading