- Keywords: Deep Learning, Meta-learning, MAML, Sparsity
- Abstract: Finding neural network weights that generalize well from small datasets is difficult. A promising approach is to (meta-)learn a weight initialization from a collection of tasks, such that a small number of weight changes results in low generalization error. We show that this form of meta-learning can be improved by letting the learning algorithm decide which weights to change, i.e., by learning where to learn. We find that patterned sparsity emerges from this process. Lower-level features tend to be frozen, while weights close to the output remain plastic. This selective sparsity enables running longer sequences of weight updates with-out overfitting, resulting in better generalization in the miniImageNet benchmark. Our findings shed light on an ongoing debate on whether meta-learning can discover adaptable features, and suggest that sparse learning can outperform simpler feature reuse schemes.
- Proposed Reviewers: Johannes Oswald, email@example.com Nicolas Zucchet, firstname.lastname@example.org