Mixture-of-Variational-Experts for Continual LearningDownload PDF

09 Mar 2022, 10:53 (modified: 20 Apr 2022, 15:32)ALOE@ICLR2022Readers: Everyone
Keywords: Continual Learning, Mixture-Of-Experts, Variational Bayes, Information Theory
TL;DR: We introduce the Mixture-of-Variational-Experts layer for task-agnostic continual learning. To improve expert specialization, we introduce a novel diversity objective.
Abstract: One weakness of machine learning algorithms is the poor ability of models to solve new problems without forgetting previously acquired knowledge. The Continual Learning (CL) paradigm has emerged as a protocol to systematically investigate settings where the model sequentially observes samples generated by a series of tasks. In this work, we take a task-agnostic view of continual learning and develop a hierarchical information-theoretic optimality principle that facilitates a trade-off between learning and forgetting. We discuss this principle from a Bayesian perspective and show its connections to previous approaches to CL. Based on this principle, we propose a neural network layer, called the Mixture-of-Variational-Experts layer, that alleviates forgetting by creating a set of information processing paths through the network which is governed by a gating policy. Due to the general formulation based on generic utility functions, we can apply this optimality principle to a large variety of learning problems, including supervised learning, reinforcement learning, and generative modeling. We demonstrate the competitive performance of our method in continual supervised learning and in continual reinforcement learning.
1 Reply