Abstract: Training methods for neural networks are primarily variants on stochastic gradient descent. Techniques that use (approximate) second-order information are rarely used because of the computational cost and noise associated with those approaches in deep learning contexts. We can show that feedforward and recurrent neural networks exhibit an outer product derivative structure but that convolutional neural networks do not. This structure makes it possible to use higher-order information without needing approximations or significantly increasing computational cost.
TL;DR: We can show that feedforward and recurrent neural networks exhibit an outer product derivative structure, and this makes it possible to use higher-order information without needing approximations or significantly increasing computational cost.
Keywords: Deep Learning, Training Methods, Gradient Calculations
3 Replies
Loading