The Unexplored Potential of Vision-Language Models for Generating Large-Scale Complementary-Label Learning Data
Abstract: Complementary-Label Learning (CLL) is a weakly-supervised learning paradigm designed to reduce label collection costs compared to traditional supervised learning with ordinary labels. However, its competitiveness and feasibility in real-world scenarios still need to be determined. Although recent CLL studies using real-world datasets with human annotations have begun to explore these challenges, annotating complementary labels still incurs a non-trivial cost. Consequently, the current availability of real-world data is insufficient to fully demonstrate the practical scalability of CLL. The emergence of Vision-Language Models (VLMs) presents a promising alternative to address the limitation. Somehow, our analysis shows that directly converting the human labeling process for VLMs introduces significant label noise and bias. To address this issue, we developed customized prompts designed to systematically reduce label noise and bias in VLM-based labeling. Our proposed framework effectively curates VLM-annotated, achieving an improvement of 10% performance over human-annotated datasets. This work represents a significant step toward making CLL viable for real-world applications.
Loading