Keywords: conditional computation, routing models, depth
TL;DR: Per-example routing models benefit from architectural diversity, but still struggle to scale to a large number of routing decisions.
Abstract: Routing models, a form of conditional computation where examples are routed through a subset of components in a larger network, have shown promising results in recent works. Surprisingly, routing models to date have lacked important properties, such as architectural diversity and large numbers of routing decisions. Both architectural diversity and routing depth can increase the representational power of a routing network. In this work, we address both of these deficiencies. We discuss the significance of architectural diversity in routing models, and explain the tradeoffs between capacity and optimization when increasing routing depth. In our experiments, we find that adding architectural diversity to routing models significantly improves performance, cutting the error rates of a strong baseline by 35% on an Omniglot setup. However, when scaling up routing depth, we find that modern routing techniques struggle with optimization. We conclude by discussing both the positive and negative results, and suggest directions for future research.
Data: [Birdsnap](https://paperswithcode.com/dataset/birdsnap), [Food-101](https://paperswithcode.com/dataset/food-101), [Stanford Cars](https://paperswithcode.com/dataset/stanford-cars)