Abstract: Collecting large-scale multi-label data with \emph{full labels} is difficult for real-world scenarios. Many existing studies have tried to address the issue of missing labels caused by annotation but ignored the difficulties encountered during the annotation process. We find that the high annotation workload can be attributed to two reasons: (1) Annotators are required to identify labels on widely varying visual concepts. (2) Exhaustively annotating the entire dataset with all the labels becomes notably difficult and time-consuming. In this paper, we propose a new setting, i.e. block diagonal labels, to reduce the workload on both sides. The numerous categories can be divided into different subsets based on semantics and relevance. Each annotator can only focus on its own subset of labels so that only a small set of highly relevant labels are required to be annotated per image. To deal with the issue of such \emph{missing labels}, we introduce a simple yet effective method that does not require any prior knowledge of the dataset. In practice, we propose an Adaptive Pseudo-Labeling method to predict the unknown labels with less noise. Formal analysis is conducted to evaluate the superiority of our setting. Extensive experiments are conducted to verify the effectiveness of our method on multiple widely used benchmarks.
Primary Subject Area: [Content] Media Interpretation
Secondary Subject Area: [Content] Media Interpretation
Relevance To Conference: In real-world applications, obtaining a large multi-label dataset is a significant challenge. We analyze the high annotation workload involved in the annotation process. We hope that our efforts will facilitate improvements in this domain.
Supplementary Material: zip
Submission Number: 1933
Loading