Self-Supervised Learning in the Twilight of Noisy Real-World Datasets

Atharva Tendle, Andrew Little, Stephen D. Scott, Mohammad Rashedul Hasan

Published: 2022, Last Modified: 16 Jun 2023ICMLA 2022Readers: Everyone

Abstract: Despite the effort toward benchmarking self-supervised learning (SSL) methods for image recognition transfer learning tasks, our understanding is still limited about their performance on noisy real-world datasets. This paper presents an extensive analysis of various types of SSL methods on real-world datasets containing noisy images of wildlife animals. These uncurated images are auto-captured by motion-activated cameras or camera traps installed in the wild. The camera-trap datasets exhibit different types of biases typically present in practical tasks. Using a set of variably-size biased datasets, we compare the supervised learning (SL) method to two types of SSL methods, i.e., instance discrimination and cluster discrimination. Our results reveal nuances in SSL’s performance. For example, we show that though SSL methods are often more generalizable than the SL method, the performance gain of some SSL methods diminishes with the reduction in the size of the target dataset. Also, there exists significant variability in the effectiveness of the two types of SSL methods. In addition to this, we show that, unlike SL, both types of SSL gain from increased model capacity.

0 Replies