Abstract: The successes of deep learning in recent years has been fueled by the development
of innovative new neural network architectures. However, the design of a neural
network architecture remains a difficult problem, requiring significant human expertise
as well as computational resources. In this paper, we propose a method
for transforming a discrete neural network architecture space into a continuous
and differentiable form, which enables the use of standard gradient-based optimization
techniques for this problem, and allows us to learn the architecture and
the parameters simultaneously. We evaluate our methods on the Udacity steering
angle prediction dataset, and show that our method can discover architectures
with similar or better predictive accuracy but significantly fewer parameters and
smaller computational cost.
Keywords: architecture search
4 Replies
Loading