Abstract: Stochastic gradient descent (SGD) is a method commonly used in training neural networks. While SGD offers great flexibility in fine-tuning the optimization process,it can often lead to a tedious search of optimal hyperparameters, which include batch size and learning rate. These parameters can not only affect the performance of the model but can also greatly impact the amount of time necessary to train and test these models. In their paper titled "Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence," Fengxian He, Tongliang Liu and Dacheng Tao outline a strategy in selecting an optimal batch size and learning rate in order to increase the generalization ability of the model. In order to reproduce their findings, we train VGG-19, ResNet-50, Xception and a custom Convolutional Neural Network with a set of batch sizes and learning rate. Through our result, we arrive at the same conclusion as He et al, demonstrating a positive relationship between the learning rate and test accuracy and a negative correlation between the batch size and the generalizability of neural networks. Finally, the previous conclusions prove that there exist a negative correlation between the ratio of batch size to learning rate and the test accuracy.
Track: Replicability
NeurIPS Paper Id: https://openreview.net/forum?id=BJfTE4BxUB¬eId=H1eNmFc7or
5 Replies
Loading