Keywords: meta-learning, attention, weight, initialisation, optimisation
TL;DR: We propose to weight the tasks in a batch in according to their “importance" in improving the meta-model’s learning in a meta-learning setting.
Abstract: Meta-learning (ML) has emerged as a promising direction in learning models under constrained resource settings like few-shot learning. The popular approaches for ML either learn a generalizable initial model or a generic parametric optimizer through episodic training. The former approaches leverage the knowledge from a batch of tasks to learn an optimal prior. In this work, we study the importance of tasks in a batch for ML. We hypothesize that the common assumption in batch episodic training where each task in a batch has an equal contribution to learning an optimal meta-model need not be true. We propose to weight the tasks in a batch according to their ``importance" in improving the meta-model's learning. To this end, we introduce a training curriculum, called task attended meta-training, to weight the tasks in a batch. The task attention is a standalone unit and can be integrated with any batch episodic training regimen. The comparisons of the task-attended ML models with their non-task-attended counterparts on complex datasets like miniImageNet, FC100 and tieredImageNet validate its effectiveness.
Contribution Process Agreement: Yes
Author Revision Details: 1. We thank reviewers for bringing up the discrepancy in figure 2 and figure 4 in our notice. We have updated figure 2 for 60,000 iterations. The observations that "the models meta-trained with TA regimen tend to achieve higher/at-par performance in fewer iterations than the corresponding models meta-trained with the non-TA regimen" stands still.
2. The choice of batch size as 4 was followed from [1,2]. However, we have added the study of its impact in Table 1 of supplementary material. Specifically, we compared the few-shot classification performance of MAML and TA-MAML on miniImageNet dataset with meta-batch size 6 for 5 and 10-way (1 and 5-shot) settings. We observe that TA-MAML consistently performs better than MAML, and an increase in the tasks in a batch improves the performance of both MAML and TA-MAML. However, the hardware constraint restricts the study on 10-way 5-shot setting and meta-batch size of 8 or higher.
[1] Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In ICML, 2017.
[2] Muhammad Abdullah Jamal and Guo-Jun Qi. Task agnostic meta-learning for few-shot learning.In CVPR, 2019
Process Comment: I am thankful for the opportunity to review the papers for this prestigious workshop. I felt that the timeline to complete the first phase of reviews was a little tight but doable. An optimally relaxed deadline would be great in the future!
Poster Session Selection: Poster session #1 (12:00 UTC+1)
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2106.10642/code)
0 Replies
Loading