Learning To Avoid Negative Transfer in Few Shot Transfer Learning

James O' Neill

Learning To Avoid Negative Transfer in Few Shot Transfer Learning

James O' Neill

24 Mar 2019 (modified: 05 May 2023)Submitted to LLD 2019Readers: Everyone

Keywords: few shot learning, negative transfer, cubic spline, ensemble learning

TL;DR: A dynamic bagging methods approach to avoiding negatve transfer in neural network few-shot transfer learning

Abstract: Many tasks in natural language understanding require learning relationships between two sequences for various tasks such as natural language inference, paraphrasing and entailment. These aforementioned tasks are similar in nature, yet they are often modeled individually. Knowledge transfer can be effective for closely related tasks, which is usually carried out using parameter transfer in neural networks. However, transferring all parameters, some of which irrelevant for a target task, can lead to sub-optimal results and can have a negative effect on performance, referred to as \textit{negative} transfer. Hence, this paper focuses on the transferability of both instances and parameters across natural language understanding tasks by proposing an ensemble-based transfer learning method in the context of few-shot learning. Our main contribution is a method for mitigating negative transfer across tasks when using neural networks, which involves dynamically bagging small recurrent neural networks trained on different subsets of the source task/s. We present a straightforward yet novel approach for incorporating these networks to a target task for few-shot learning by using a decaying parameter chosen according to the slope changes of a smoothed spline error curve at sub-intervals during training. Our proposed method show improvements over hard and soft parameter sharing transfer methods in the few-shot learning case and shows competitive performance against models that are trained given full supervision on the target task, from only few examples.

3 Replies

Loading