Deconstructing the Ladder Network Architecture

Mohammad Pezeshki, Linxi Fan, Philemon Brakel, Aaron Courville, Yoshua Bengio

Feb 18, 2016 (modified: Feb 18, 2016) ICLR 2016 workshop submission readers: everyone
  • CMT id: 292
  • Abstract: The Ladder Network is a recent new approach to semi-supervised learning that turned out to be very successful. While showing impressive performance, the Ladder Network has many components intertwined, whose contributions are not obvious in such a complex architecture. This paper presents an extensive experimental investigation of variants of the Ladder Network in which we replaced or removed individual components to learn about their relative importance. For semi-supervised tasks, we conclude that the most important contribution is made by the lateral connections, followed by the application of noise, and the choice of what we refer to as the `combinator function'. As the number of labeled training examples increases, the lateral connections and the reconstruction criterion become less important, with most of the generalization improvement coming from the injection of noise in each layer. Finally, we introduce a combinator function that reduces test error rates on Permutation-Invariant MNIST to 0.57\% for the supervised setting, and to 0.97 % and 1.0 % for semi-supervised settings with 1000 and 100 labeled examples, respectively.
  • Conflicts: umontreal.ca, columbia.edu

Loading