Abstract: Deep learning has shown that we can solve many problems by trading domain expertise for computation. This has created new domains of expertise in the training of deep networks, such as how to initialize learning. In this work, we introduce an algorithm called MetaInit as a step towards automating the search for good initializations using meta-learning. To guide the search, we hypothesize that a good inductive bias for initializations is to make first order gradient descent easier by starting learning in regions where gradient descent is less affected by second order effects. This is a generic criterion that does not make assumptions about the architecture and that can be minimized efficiently using gradient descent to tune the scales of the initial weight matrices. Our experiments on plain and residual networks show that the algorithm can automatically recover from a class of bad initializations and can train networks competitive to state-of-the-art results without normalization. While applying the method to large scale problems is more difficult, we show our algorithm can also apply to Resnet-50 models on Imagenet.
CMT Num: 6877
0 Replies
Loading