Abstract: We study the problem of training deep fully connected neural networks. Despite much progress in the design of activation functions, novel normalization techniques, and various skip-connection techniques, such networks remain challenging to train due to vanishing or exploding gradients. Our method is based on employing a different class-dependent learning rate to each network weight. Since the learning rates are hyperparameters and not part of the network, we perform an analytical continuation of the network, and create a generalized network. Following this reparameterization, the set of per-class per-weight learning rates are being manipulated during the training iterations. Our results show that the new algorithm leads to improved classification accuracy for both classical and modern activation functions.
Keywords: adaptive learning rates, analytical continuation, fully connected networks
3 Replies
Loading