Stochastic Gradient Descent Tricks

Léon Bottou

Published: 2012, Last Modified: 27 Jan 2025Neural Networks: Tricks of the Trade (2nd ed.) 2012EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Chapter 1 strongly advocates the stochastic back-propagation method to train neural networks. This is in fact an instance of a more general technique called stochastic gradient descent (SGD). This chapter provides background material, explains why SGD is a good learning algorithm when the training set is large, and provides useful recommendations.