Keywords: Deep learning, Second-order optimization, K-FAC, Shampoo, Large Batch Training
TL;DR: We have identified a condition empirically where it is important to use second-order optimization.
Abstract: While numerous second-order optimization methods have been proposed to accelerate training in deep learning, they are seldom used in practice.
This is partly due to a limited understanding of the conditions under which second-order optimization outperforms first-order optimization.
This study aims to identify these conditions, particularly in terms of batch size and dataset size.
We find empirically that second-order optimization outperforms first-order optimization when the batch size is large and the data set size is not too large.
Submission Number: 30
Loading