Abstract: Deep neural networks have recently achieved state of the art performance thanks to new
training algorithms for rapid parameter estimation and new regularizations to reduce overfitting. However, in practice the network architecture has to be manually set by domain
experts, generally by a costly trial and error procedure, which often accounts for a large
portion of the final system performance. We view this as a limitation and propose a novel
training algorithm that automatically optimizes network architecture, by progressively increasing model complexity and then eliminating model redundancy by selectively removing
parameters at training time. For convolutional neural networks, our method relies on iterative split/merge clustering of convolutional kernels interleaved by stochastic gradient
descent. We present a training algorithm and experimental results on three different vision
tasks, showing improved performance compared to similarly sized hand-crafted architectures.
Loading