- Abstract: Large-batch training is known to incur poor generalization by Jastrzebski et al. (2017) as well as poor adversarial robustness by Yao et al. (2018b). Hessian-based analysis of large-batch training by Yao et al. (2018b) concludes that adversarial training as well as small-batch training leads to lower Hessian spectrum. They combine adversarial training and second order information to come up with a new large-batch training algorithm to obtain robust models with good generalization. In this paper, we empirically observe that networks trained with constant learning rate to batch size ratio as proposed by Jastrzebski et al. (2017) not only have better generalization but also have roughly constant adversarial robustness across all batch sizes.