Learning with Plasticity Rules: Generalization and Robustness

Anonymous

Learning with Plasticity Rules: Generalization and Robustness

Anonymous

30 Sept 2021 (modified: 05 May 2023)NeurIPS 2021 Workshop MetaLearn Blind SubmissionReaders: Everyone

Keywords: metalearning, plasticity, hebbian learning, robustness, generalization

TL;DR: Metalearned plasticity rules generalize across networks and tasks and encourage adversarial robustness

Abstract: Brains learn robustly, and generalize effortlessly between different learning tasks; in contrast, robustness and generalization across tasks are well known weaknesses of artificial neural nets (ANNs). How can we use our accelerating understanding of the brain to improve these and other aspects of ANNs? Here we hypothesize that (a) Brains employ synaptic plasticity rules that serve as proxies for Gradient Descent (GD); (b) These rules themselves can be learned by GD on the rule parameters; and (c) This process may be a missing ingredient for the development of ANNs that generalize well and are robust to adversarial perturbations. We provide both empirical and theoretical evidence for this hypothesis. In our experiments, plasticity rules for the synaptic weights of recurrent neural nets (RNNs) are learned through GD and are found to perform reasonably well (with no backpropagation). We find that plasticity rules learned by this process generalize from one type of data/classifier to others (e.g., rules learned on synthetic data work well on MNIST/Fashion MNIST) and converge with fewer updates. Moreover, the classifiers learned using plasticity rules exhibit surprising levels of tolerance to adversarial perturbations. Focusing on the last layer of a classification network, we show analytically that GD on the plasticity rule recovers (and can improve upon) the perceptron algorithm and the multiplicative weights method; and the learned weights are provably robust to a quantifiable extent. Finally, we argue that applying GD to learning plasticity rules is biologically plausible, in the sense that they can be learned over evolutionary time: we show that, within the standard population genetic framework used to study evolution, natural selection of a numerical parameter over a sequence of generations provably simulates a simple variant of GD.

0 Replies

Loading