The Negative Pretraining Effect in Sequential Deep Learning and Three Ways to Fix It

Julian G. Zilly; Franziska Eckert; Bhairav Mehta; Andrea Censi; Emilio Frazzoli

The Negative Pretraining Effect in Sequential Deep Learning and Three Ways to Fix It

Julian G. Zilly, Franziska Eckert, Bhairav Mehta, Andrea Censi, Emilio Frazzoli

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Transfer learning, Deep learning, Sequential learning, Critical learning periods, Curriculum learning

Abstract: Negative pretraining is a prominent sequential learning effect of neural networks where a pretrained model obtains a worse generalization performance than a model that is trained from scratch when either are trained on a target task. We conceptualize the ingredients of this problem setting and examine the negative pretraining effect experimentally by providing three interventions to remove and fix it. First, acting on the learning process, altering the learning rate after pretraining can yield even better results than training directly on the target task. Second, on the learning task-level, we intervene by increasing the discretization of data distribution changes from start to target task instead of “jumping” to a target task. Finally at the model-level, resetting network biases to larger values likewise removes negative pretraining effects, albeit to a smaller degree. With these intervention experiments, we aim to provide new evidence to help understand the subtle influences that neural network training and pretraining can have on final generalization performance on a target task in the context of negative pretraining.

One-sentence Summary: We conceptualize and formalize the problem setting of the negative pretraining effect and offer three empirical interventions to fix it.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Reviewed Version (pdf): https://openreview.net/references/pdf?id=BPbZ8TpGq

6 Replies

Loading