On the Role of Pre-training for Meta Few-Shot Learning

Chia-You Chen; Hsuan-Tien Lin; Masashi Sugiyama; Gang Niu

On the Role of Pre-training for Meta Few-Shot Learning

Chia-You Chen, Hsuan-Tien Lin, Masashi Sugiyama, Gang Niu

Published: 10 Dec 2021, Last Modified: 05 May 2023NeurIPS 2021 Workshop MetaLearn PosterReaders: Everyone

Keywords: Meta-Learning, Episodic Training, Pre-training, Disentanglement

TL;DR: This work understands the Meta Few-Shot Learning from a new aspect.

Abstract: Few-shot learning aims to classify unknown classes of examples with a few new examples per class. There are two key routes for few-shot learning. One is to (pre-)train a classifier with examples from known classes, and then transfer the pre-trained classifier to unknown classes using the new examples. The other, called meta few-shot learning, is to couple pre-training with episodic training, which contains episodes of few-shot learning tasks simulated from the known classes. Pre-training is known to play a crucial role for the transfer route, but the role of pre-training for the episodic route is less clear. In this work, we study the role of pre-training for the episodic route. We find that pre-training serves as major role of disentangling representations of known classes, which makes the resulting learning tasks easier for episodic training. The finding allows us to shift the huge simulation burden of episodic training to a simpler pre-training stage. We justify such a benefit of shift by designing a new disentanglement-based pre-training model, which helps episodic training achieve competitive performance more efficiently.

Contribution Process Agreement: Yes

Poster Session Selection: Poster session #1 (12:00 UTC+1)

Author Revision Details: > The fact that better disentanglement could benefit ProtoNets seems fairly intuitive; it could be interesting to study other methods in the context such as MAML, or even Proto-MAML which is a combination of both methods. For the MAML experiment, we have tried it. However, the result doesn't show improvement. One of the reasons is that the surrogate loss we introduce is much more similar to metric-based framework instead of the MAML-like framework. > Why isn't W_i updated during backpropagation? W is learnable.

0 Replies

Loading