Hard ASH: Sparsity and the right optimizer make a continual learner

Published: 19 Mar 2024, Last Modified: 01 Apr 2024Tiny Papers @ ICLR 2024 NotableEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Continual Learning, Life long learning, Activation functions, Optimizer
TL;DR: We show our sparse activation function helps in continual learning but we replace all continual learning tricks with Adagrad and retain surprisingly good performance.
Abstract: In class incremental learning, neural networks typically suffer from catastrophic forgetting. We show that an MLP featuring a sparse activation function and an adaptive learning rate optimizer can compete with established regularization techniques in the Split-MNIST task. We highlight the effectiveness of the Adaptive SwisH (ASH) activation function in this context and introduce a novel variant, Hard Adaptive SwisH (Hard ASH) to further enhance the learning retention.
Submission Number: 141
Loading