When Does Second-Order Optimization Speed Up Training?

Published: 19 Mar 2024, Last Modified: 10 May 2024Tiny Papers @ ICLR 2024 PresentEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Deep learning, Second-order optimization, K-FAC, Shampoo, Large Batch Training
TL;DR: We have identified a condition empirically where it is important to use second-order optimization.
Abstract: While numerous second-order optimization methods have been proposed to accelerate training in deep learning, they are seldom used in practice. This is partly due to a limited understanding of the conditions under which second-order optimization outperforms first-order optimization. This study aims to identify these conditions, particularly in terms of batch size and dataset size. We find empirically that second-order optimization outperforms first-order optimization when the batch size is large and the data set size is not too large.
Submission Number: 30