Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks
Aditya Devarakonda, Maxim Naumov, Michael Garland
Feb 06, 2018 (modified: Feb 06, 2018)ICLR 2018 Workshop Submissionreaders: everyone
Abstract:We introduce a new deep learning training approach that adaptively increases the batch size during the training process. Our method delivers the convergence rate of small, fixed batch sizes while achieving performance similar to large, fixed batch sizes. We train the VGG and ResNet networks on the CIFAR-100 and ImageNet datasets. Our results show that learning with adaptive batch sizes can improve performance by factors of up to 6.25 on 4 NVIDIA Tesla P100 GPUs while attaining similar accuracies to small batch sizes. Using our technique, we are able to train ImageNet with batch sizes up to 524, 288.
TL;DR:The batch size during CNN training can be adaptively increased to yield better performance and obtain similar accuracies to fixed batch size training.