AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks

Aditya Devarakonda, Maxim Naumov, Michael Garland

Feb 06, 2018 (modified: Feb 06, 2018) ICLR 2018 Workshop Submission readers: everyone
  • Abstract: We introduce a new deep learning training approach that adaptively increases the batch size during the training process. Our method delivers the convergence rate of small, fixed batch sizes while achieving performance similar to large, fixed batch sizes. We train the VGG and ResNet networks on the CIFAR-100 and ImageNet datasets. Our results show that learning with adaptive batch sizes can improve performance by factors of up to 6.25 on 4 NVIDIA Tesla P100 GPUs while attaining similar accuracies to small batch sizes. Using our technique, we are able to train ImageNet with batch sizes up to 524, 288.
  • TL;DR: The batch size during CNN training can be adaptively increased to yield better performance and obtain similar accuracies to fixed batch size training.
  • Keywords: adaptive batch sizes, convolutional neural networks