Abstract: The predominant method for computing confidence intervals (CI) in few-shot learning (FSL) is based on sampling the tasks with replacement, i.e. allowing the same samples to appear in multiple tasks. This makes the CI misleading in that it takes into account the randomness of the sampler but not the data itself. To quantify the extent of this problem, we conduct a comparative analysis between CIs computed with and without replacement. These reveal a notable underestimation by the predominant method. This observation calls for a reevaluation of how we interpret confidence intervals and the resulting conclusions in FSL comparative studies. Our research demonstrates that the use of paired tests can partially address this issue. Additionally, we explore methods to further reduce the (size of the) CI by strategically sampling tasks of a specific size. We also introduce a new optimized benchmark, which can be accessed at https://github.com/RafLaf/FSL-benchmark-again
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Dear reviewers and AC,
Thank you accepting our paper.
1) We changed one sentence that did not sound nice while not changing its meaning:
Before:
*"Our investigation using Open Confidence Intervals (OCIs) can lead to inconsistent conclusions to using the classical approach in the field of few-shot learning"*
After:
*“Our investigation using Open Confidence Intervals (OCIs) can lead to conclusions that are inconsistent with those obtained using the classical approach in the field of few-shot learning.”*
2) We updated the abstract with the non-anonymous github link.
3) We made a youtube video on the paper https://www.youtube.com/watch?v=OB95-BLxE0s
Video: https://www.youtube.com/watch?v=OB95-BLxE0s
Code: https://github.com/RafLaf/FSL-benchmark-again
Assigned Action Editor: ~Eleni_Triantafillou1
Submission Number: 2678
Loading