Why do Models with Conditional Computation Learn Suboptimal Solutions?

Muqeeth Mohammed; Haokun Liu; Colin Raffel

Why do Models with Conditional Computation Learn Suboptimal Solutions?

Muqeeth Mohammed, Haokun Liu, Colin Raffel

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: neural networks, conditional computation, gradient estimation

Abstract: Sparsely-activated neural networks with conditional computation learn to route their inputs through different subnetworks, providing a strong structural prior and reducing computational costs. Despite their possible benefits, models with learned routing often underperform their parameter-matched densely-activated counterparts as well as models that use non-learned heuristic routing strategies. In this paper, we hypothesize that these shortcomings stem from the gradient estimation techniques used to train sparsely-activated models with non-differentiable discrete routing decisions. To test this hypothesis, we evaluate the performance of sparsely-activated models trained with various gradient estimation techniques in three settings where a high-quality heuristic routing strategy can be designed. Our experiments reveal that learned routing reaches substantially different (and worse) solutions than heuristic routing in various settings. As a first step towards remedying this gap, we demonstrate that supervising the routing decision on a small fraction of the examples is sufficient to help the model to learn better routing strategies. Our results shed light on the difficulties of learning effective routing and set the stage for future work on conditional computation mechanisms and training techniques.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning

Supplementary Material: zip

5 Replies

Loading