Keywords: meta-learning, deep learning
Abstract: Deep neural network training largely depends on the choice of initial weight distribution. However, this choice can often be nontrivial. Existing theoretical results for this problem mostly cover simple architectures, e.g., feedforward networks with ReLU activations. The architectures used for practical problems are more complex and often incorporate many overlapping modules, making them challenging for theoretical analysis. Therefore, practitioners have to use heuristic initializers with questionable optimality and stability. In this study, we propose a task-agnostic approach that discovers initializers for specific network architectures and optimizers by learning the initial weight distributions directly through the use of Meta-Learning. In several supervised and unsupervised learning scenarios, we show the advantage of our initializers in terms of both faster convergence and higher model performance.
Ethics Statement: Our work proposes an automated method to discover better initializations for neural networks that contributes to the long-standing problem of simplifying the use of neural networks for both academic and industrial users. We hope that our research will facilitate the widespread usage of deep learning by the non-professional audience. In turn, this can potentially benefit industries that do not possess strong machine learning expertise yet, such as agriculture, mining, transportation, etc.
Crc Pdf: pdf
Poster Pdf: pdf
Original Version: pdf
4 Replies
Loading