Semi-Supervised Multi-View Multi-Label Learning with View-Specific Transformer and Enhanced Pseudo-Label

Published: 01 Jan 2025, Last Modified: 14 May 2025AAAI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Multi-view multi-label learning has become a research focus for describing objects with rich expressions and annotations. However, real-world data often contains numerous unlabeled instances, due to the high cost and technical limitations of manual labeling. This crucial problem involves three main challenges: i) How to extract advanced semantics from available views? ii) How to build a refined classification framework with limited labeled space? iii) How to provide more high-quality supervisory information? To address these problems, we propose a Semi-Supervised Multi-View Multi-Label Learning Method with View-Specific Transformer and Enhanced Pseudo-Label named SMVTEP. Specifically, Generative Adversarial Networks are employed to extract informative shared and specific representations and their consistency and distinctiveness are ensured through the adversarial mechanism and information theory based contrastive learning. Then we build specific classifiers for each extracted feature and apply instance-level manifold constraints to reduce bias across classifiers. Moreover, we design a transformer-style fusion approach that simultaneously captures the imbalance of expressive power among views, mapping effects on specific labels, and label dependencies by incorporating confidence scores and category semantics into the self-attention mechanism. Furthermore, after using Mixup for data augmentation, category-enhanced pseudo-labels are leveraged to improve the reliability of additional annotations by aligning the label distribution of unlabeled samples with the true distribution. Finally, extensive experimental results validate the effectiveness of SMVTEP against state-of-the-art methods.
Loading