Towards Robust and Efficient Continual Language Learning
Abstract: As the application space of language models
continues to evolve, a natural question to ask
is how we can quickly adapt models to new
tasks. We approach this classic question from
a continual learning perspective, in which we
aim to continue fine-tuning models trained
on past tasks on new tasks, with the goal of
“transferring” relevant knowledge. However,
this strategy also runs the risk of doing more
harm than good, i.e., negative transfer. In this
paper, we construct a new benchmark of task
sequences that target different possible transfer
scenarios one might face, such as a sequence
of tasks with high potential of positive transfer,
high potential for negative transfer, no expected
effect, or a mixture of each. An ideal learner
should be able to maximally exploit information from all tasks that have any potential
for positive transfer, while also avoiding the
negative effects of any distracting tasks that
may confuse it. We then propose a simple,
yet effective, learner that satisfies many of our
desiderata simply by leveraging a selective
strategy for initializing new models from past
task checkpoints. Still, limitations remain, and
we hope this benchmark can help the community to further build and analyze such learners.
Loading