Abstract: The successes of deep learning in recent years has been fueled by the development of innovative new neural network architectures. However, the design of a neural network architecture remains a difficult problem, requiring significant human expertise as well as computational resources. In this paper, we propose a method for transforming a discrete neural network architecture space into a continuous and differentiable form, which enables the use of standard gradient-based optimization techniques for this problem, and allows us to learn the architecture and the parameters simultaneously. We evaluate our methods on the Udacity steering angle prediction dataset, and show that our method can discover architectures with similar or better predictive accuracy but significantly fewer parameters and smaller computational cost.
Keywords: architecture search