Expanding continual few-shot learning benchmarks to include recognition of specific instances

TMLR Paper1376 Authors

13 Jul 2023 (modified: 17 Sept 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Continual learning and few-shot learning are important frontiers in progress towards broader Machine Learning (ML) capabilities. There is a growing body of work in both, but few works combining the two. One exception is the Continual Few-shot Learning (CFSL) framework of Antoniou et al. (2020). In this study, we extend CFSL in two ways that capture a broader range of challenges, important for intelligent agent behaviour in real-world conditions. First, we modify CFSL to make it more comparable to standard continual learning experiments, where usually a much larger number of classes are presented. Second, we introduce an ‘instance test’ which requires recognition of specific instances of classes – a capability of animal cognition that is usually neglected in ML. For an initial exploration of ML model performance under these conditions, we selected representative baseline models from the original CFSL work and added a model variant with replay. As expected, learning more classes is more difficult than the original CFSL experiments, and interestingly, the way in which image instances and classes are presented affects classification performance. Surprisingly, accuracy in the baseline instance test is comparable to other classification tasks, but poor given significant occlusion and noise. The use of replay for consolidation improves performance substantially for both types of tasks, but particularly the instance test.
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=dsGJo0GuF5&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DTMLR%2FAuthors%23your-submissions)
Changes Since Last Submission: Thankyou very much for reviewing our paper and providing constructive & insightful feedback. In response we altered the narrative of the paper, ran additional experiments with additional datasets & made extensive changes throughout, which substantially improved the quality of the paper. In summary, the focus shifted away from the models, to improving the benchmark itself by focusing on the instance test & additional instance tests, by incorporating varying levels of noise and occlusion. We also added a more complex dataset (Slimagenet) to all experiments. There were common themes across the reviews, and each resulted in large changes. Therefore we explain the changes in broad terms. Due to the character limit here, we could not include excerpts. __Theme: Literature review__ _The reviewers pointed out that the related work was missing and literature review was shallow._ Thank you for pointing this out & for the suggested references. We improved the introduction to better contextualize the research within the broader literature, & added a Related Work section that includes many of the papers that the reviewers suggested, & others we felt relevant. The related work covers key benchmarks, key tasks and architectural approaches. _Additionally, Reviewer 2 asked “how the proposed method differs from previous continual learning approaches that also incorporate replay"_ We added Section 2.4 on other replay methods. __Theme: Motivation and contributions__ _The reviewers requested that we clearly identify the research gap and motivation and add a ‘contributions’ section._ These comments were very helpful and catalyzed a shift in emphasis. We re-evaluated the core contribution, which is primarily to extend the Continual Few-shot Learning paradigm, rather than introduce a new method (replay). The new Introduction and Related work setup the background, identify the gap and describe the Motivation, including justification for the experiments (which is expanded in the method section). We also added a Contributions section to explicitly describe our specific contributions. __Theme: Title__ _The reviewers pointed out that the title is inappropriate as we do not use a biologically realistic model and because the key contributions do not appear to be related to replay, but rather the benchmark._ We agree with these comments. We changed the emphasis of the paper as described above, and changed the title accordingly to “Expanding continual few-shot learning benchmarks to include recognition of specific instances”. __Theme: Experiment design: clarity and justification__ _The reviewers pointed out that the experimental method did not have enough detail, including hardware used, details of training method, hparam searching method, replay method, description of the instance test, and there were errors in the naming of training methods as architectures._ We substantially expanded and improved the whole experimental details section to address all of the above. We added: - Details of hardware, Section 3.5 - Hparam searching, Section 3.4 - We corrected the descriptions and naming of the training methods - More thorough description of the replay method, 3.4.3 - More thorough description of the Instance test, including Figure 3 - More thorough description of the Pretrain+Tune process, including Figure 4. - More thorough description of the underlying architectures of both training methods, providing the necessary context to compare their performances, Section 3.4 and subsections __Theme: More experiments__ _The reviewers requested more difficult datasets._ We expanded the experiments from just Omniglot, to include SlimageNet64 as well (the second dataset used in the paper by Antoniou, on which we based our experiments). _There were other suggestions for additional experiments such as adding replay to ProtoNets or adding additional baselines such as ResNet architectures._ These are all really good suggestions. Given the new narrative, which focuses on the benchmark as the main contribution, we felt that the best way to expand the experiments was by adding SlimageNet64 and the additional instance tests with noise and occlusion; and that the existing baselines substantiate our contribution. We hope that the benchmark inspires others to try additional architectures and training methods. Regarding the ProtoNets specifically, we added an explanation into section 3.4.3 (excerpt below) and a discussion of other architectures in Limitations & Future Work. We will post a more thorough response in the page of the previous version of the article. We withdrew it following the review, due to the time it was going to take to run more experiments and re-write. https://openreview.net/forum?id=dsGJo0GuF5&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DTMLR%2FAuthors%23your-submissions)
Assigned Action Editor: ~Eleni_Triantafillou1
Submission Number: 1376
Loading