HyperNetworks

David Ha; Andrew M. Dai; Quoc V. Le

HyperNetworks

David Ha, Andrew M. Dai, Quoc V. Le

Published: 06 Feb 2017, Last Modified: 22 Jun 2025ICLR 2017 PosterReaders: Everyone

Abstract: This work explores hypernetworks: an approach of using one network, also known as a hypernetwork, to generate the weights for another network. We apply hypernetworks to generate adaptive weights for recurrent networks. In this case, hypernetworks can be viewed as a relaxed form of weight-sharing across layers. In our implementation, hypernetworks are are trained jointly with the main network in an end-to-end fashion. Our main result is that hypernetworks can generate non-shared weights for LSTM and achieve state-of-the-art results on a variety of sequence modelling tasks including character-level language modelling, handwriting generation and neural machine translation, challenging the weight-sharing paradigm for recurrent networks.

TL;DR: We train a small RNN to generate weights for a larger RNN, and train the system end-to-end. We obtain state-of-the-art results on a variety of sequence modelling tasks.

Conflicts: google.com

Keywords: Natural language processing, Deep learning, Supervised Learning

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 5 code implementations](https://www.catalyzex.com/paper/hypernetworks/code)

14 Replies

Loading