Optimal GPU-CPU Offloading Strategies for Deep Neural Network Training

Olivier Beaumont, Lionel Eyraud-Dubois, Alena Shilova

Published: 2020, Last Modified: 14 May 2023Euro-Par 2020Readers: Everyone

Abstract: Training Deep Neural Networks is known to be an expensive operation, both in terms of computational cost and memory load. Indeed, during training, all intermediate layer outputs (called activations) computed during the forward phase must be stored until the corresponding gradient has been computed in the backward phase. These memory requirements sometimes prevent to consider larger batch sizes and deeper networks, so that they can limit both convergence speed and accuracy. Recent works have proposed to offload some of the computed forward activations from the device memory to the main memory and requires to determine which activations should be offloaded and when these transfers should take place. We prove that this problem is NP-complete in the strong sense, and propose two heuristics based on relaxations of the problem. We then conduct a thorough experimental evaluation of standard deep neural networks.

0 Replies