- Abstract: It has been believed that stochastic feedforward neural networks (SFNN) have several advantages beyond deterministic deep neural networks (DNN): they have more expressive power allowing multi-modal mappings and regularize better due to their stochastic nature. However, training SFNN is notoriously harder. In this paper, we aim at developing efficient training methods for large-scale SFNN, in particular using known architectures and pre-trained parameters of DNN. To this end, we propose a new intermediate stochastic model, called Simplified-SFNN, which can be built upon any baseline DNN and approximates certain SFNN by simplifying its upper latent units above stochastic ones. The main novelty of our approach is in establishing the connection between three models, i.e., DNN -> Simplified-SFNN -> SFNN, which naturally leads to an efficient training procedure of the stochastic models utilizing pre-trained parameters of DNN. Using several popular DNNs, we show how they can be effectively transferred to the corresponding stochastic models for both multi-modal and classification tasks on MNIST, TFD, CIFAR-10, CIFAR-100 and SVHN datasets. In particular, our stochastic model built from the wide residual network has 28 layers and 36 million parameters, where the former consistently outperforms the latter for the classification tasks on CIFAR-10 and CIFAR-100 due to its stochastic regularizing effect.
- Keywords: Deep learning, Multi-modal learning, Structured prediction
- Conflicts: kaist.ac.kr, kaist.edu