On Adversarial Robustness of Small vs Large Batch Training

Sandesh Kamath; Amit Despande; K V Subrahmanyam

On Adversarial Robustness of Small vs Large Batch Training

Sandesh Kamath, Amit Despande, K V Subrahmanyam

17 May 2019 (modified: 05 May 2023)Submitted to ICML Deep Phenomena 2019Readers: Everyone

Abstract: Large-batch training is known to incur poor generalization by Jastrzebski et al. (2017) as well as poor adversarial robustness by Yao et al. (2018b). Hessian-based analysis of large-batch training by Yao et al. (2018b) concludes that adversarial training as well as small-batch training leads to lower Hessian spectrum. They combine adversarial training and second order information to come up with a new large-batch training algorithm to obtain robust models with good generalization. In this paper, we empirically observe that networks trained with constant learning rate to batch size ratio as proposed by Jastrzebski et al. (2017) not only have better generalization but also have roughly constant adversarial robustness across all batch sizes.

1 Reply

Loading