Abstract: We introduce optimization methods for convolutional neural networks that can be
used to improve existing gradient-based optimization in terms of generalization
error. The method requires only simple processing of existing stochastic gradients,
can be used in conjunction with any optimizer, and has only a linear overhead (in
the number of parameters) compared to computation of the stochastic gradient.
The method works by computing the gradient of the loss function with respect to
output-channel directed re-weighted L2 or Sobolev metrics, which has the effect of
smoothing components of the gradient across a certain direction of the parameter
tensor. We show that defining the gradients along the output channel direction
leads to a performance boost, while other directions can be detrimental. We present
the continuum theory of such gradients, its discretization, and application to deep
networks. Experiments on benchmark datasets, several networks and baseline
optimizers show that optimizers can be improved in generalization error by simply
computing the stochastic gradient with respect to output-channel directed metrics.
0 Replies
Loading