SMASH: One-Shot Model Architecture Search through HyperNetworks

Andrew Brock; Theo Lim; J.M. Ritchie; Nick Weston

SMASH: One-Shot Model Architecture Search through HyperNetworks

Andrew Brock, Theo Lim, J.M. Ritchie, Nick Weston

15 Feb 2018 (modified: 07 Apr 2024)ICLR 2018 Conference Blind SubmissionReaders: Everyone

Abstract: Designing architectures for deep neural networks requires expert knowledge and substantial computation time. We propose a technique to accelerate architecture selection by learning an auxiliary HyperNet that generates the weights of a main model conditioned on that model's architecture. By comparing the relative validation performance of networks with HyperNet-generated weights, we can effectively search over a wide range of architectures at the cost of a single training run. To facilitate this search, we develop a flexible mechanism based on memory read-writes that allows us to define a wide range of network connectivity patterns, with ResNet, DenseNet, and FractalNet blocks as special cases. We validate our method (SMASH) on CIFAR-10 and CIFAR-100, STL-10, ModelNet10, and Imagenet32x32, achieving competitive performance with similarly-sized hand-designed networks.

TL;DR: A technique for accelerating neural architecture selection by approximating the weights of each candidate architecture instead of training them individually.

Keywords: meta-learning, architecture search, deep learning, computer vision

Code: [![github](/images/github_icon.svg) ajbrock/SMASH](https://github.com/ajbrock/SMASH)

Data: [CIFAR-10](https://paperswithcode.com/dataset/cifar-10), [CIFAR-100](https://paperswithcode.com/dataset/cifar-100), [ModelNet](https://paperswithcode.com/dataset/modelnet), [STL-10](https://paperswithcode.com/dataset/stl-10)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/arxiv:1708.05344/code)

6 Replies

Loading