Lightweight Filtering of Noisy Web Data: Augmenting Fine-grained Datasets with Selected Internet Images

Published: 01 Jan 2021, Last Modified: 06 Mar 2025VISIGRAPP (5: VISAPP) 2021EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Despite the availability of huge annotated benchmark datasets and the potential of transfer learning, i.e., fine-tuning a pre-trained neural network to a specific task, deep learning struggles in applications where no labeled datasets of sufficient size exist. This issue affects fine-grained recognition tasks the most since correct image data annotations are expensive and require expert knowledge. Nevertheless, the Internet offers a lot of weakly annotated images. In contrast to existing work, we suggest a new lightweight filtering strategy to exploit this source of information without supervision and minimal additional costs. Our main contributions are specific filter operations that allow the selection of downloaded images to augment a training set. We filter test duplicates to avoid a biased evaluation of the methods, and two types of label noise: cross-domain noise, i.e., images outside any class in the dataset, and cross-class noise, a form of label-swapping noise. We evaluate o
Loading