Dual Batch Size Training: An efficient MGD adaptive batch size method

Yuhang Du, Wenfeng Shen, Baohua Liu, Weijia Lu, Hao Gong

Published: 2021, Last Modified: 12 May 2023ICTAI 2021Readers: Everyone

Abstract: Mini-batch Gradient Descent (MGD) has become a standard for deep learning model training. For a long period of time, the size of the mini-batch (also known as batch size) is set empirically as a fixed value, while recent works have demonstrated it could have crucial effects on the training. Although there already exist several adaptive batch size methods, they either add significant overhead to the training, or lack of robustness on various training scenarios. In this work, an adaptive batch size method to accelerate MGD training is proposed, whose basic idea is to concurrently run training with two batch sizes, and choose batch size based on the comparison of the evaluated history performance. It can be easily implemented on various deep learning platforms, and the experiment results suggest that the proposed method achieves high performance and strong robustness with acceptable and controllable overhead, which outperforms existing adaptive batch size methods.

0 Replies