Revisiting Linear Decision Boundaries for Few-Shot Learning with Transformer Hypernetworks

Samrudhdhi B. Rangrej; Kevin J Liang; Xi Yin; Guan Pang; Theofanis Karaletsos; Lior Wolf; Tal Hassner

Revisiting Linear Decision Boundaries for Few-Shot Learning with Transformer Hypernetworks

Samrudhdhi B. Rangrej, Kevin J Liang, Xi Yin, Guan Pang, Theofanis Karaletsos, Lior Wolf, Tal Hassner

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Few-Shot Learning, Meta learning, Hypernetworks, Transformers

Abstract: Few-shot learning (FSL) methods aim to generalize a model to new unseen classes using only a small number of support examples. In image classification settings, many FSL approaches utilize a similar architecture to standard supervised learning, learning a model composed of a feature extractor followed by a linear classifier head. A common choice for the classifier is ProtoNet-style nearest neighbor, but this may be suboptimal as it is context-independent. As an alternative, some methods train a parametric classifier (e.g. logistic regression, support vector machine) using embeddings from novel classes. However, task-specific training requires time and resources, and poses optimization challenges such as overfitting on only a few samples. Instead, we propose to generate linear classifiers for new classes using a transformer-based hypernetwork, performing context aggregation in permutation invariant manner. A transformer hypernetwork allows us to instantiate a new task-specific classifier without any additional training on novel tasks. Experiments conducted on 1-shot 5-way and 5-shot 5-way MiniImageNet, TieredImageNet, and CIFAR-FS demonstrate that transformer hypernetworks are capable of generating classifiers that achieve up to 1.4% higher accuracy than other commonly used linear classifiers. Among the group of methods that offer optimization-free meta-inference, we achieve new state-of-the-art in most cases.

One-sentence Summary: We propose a transfomer-based hypernetwork that generates more optimal linear classifiers for few-shot learning.

5 Replies

Loading