- Decision: submitted, no decision
- Abstract: It is well known that direct training of deep multi-layer neural networks (DNNs) will generally lead to poor results. A major progress in recent years is the invention of various unsupervised pretraining methods to initialize network parameters and it was shown that such methods lead to good prediction performance. However, the reason for the success of the pretraining has not been fully understood, although it was argued that regularization and better optimization play certain roles. This paper provides another explanation for the effectiveness of the pretraining, where we empirically show the pretraining leads to a higher level of sparseness of hidden unit activation in the resulting neural networks, and the higher sparseness is positively correlated to faster training speed and better prediction accuracy. Moreover, we also show that rectified linear units (ReLU) can capture the sparseness benefits of the pretraining. Our implementation of DNNs with ReLU does not require the pretraining, but achieves comparable or better prediction performance than traditional DNNs with pretraining on standard benchmark datasets.