Label Decoupling and Reconstruction: A Two-Stage Training Framework for Long-tailed Multi-label Medical Image Recognition

Published: 20 Jul 2024, Last Modified: 06 Aug 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Deep learning has made significant advancements and breakthroughs in medical image recognition. However, the clinical reality is complex and multifaceted, with patients often suffering from multiple intertwined diseases, not all of which are equally common, leading to medical datasets that are frequently characterized by multi-labels and a long-tailed distribution. In this paper, we propose a method involving label decoupling and reconstruction (LDRNet) to address these two specific challenges. The label decoupling utilizes the fusion of semantic information from both categories and images to capture the class-aware features across different labels. This process not only integrates semantic information from labels and images to improve the model's ability to recognize diseases, but also captures comprehensive features across various labels to facilitate a deeper understanding of disease characteristics within the dataset. Following this, our label reconstruction method uses the class-aware features to reconstruct the label distribution. This step generates a diverse array of virtual features for tail categories, promoting unbiased learning for the classifier and significantly enhancing the model’s generalization ability and robustness. Extensive experiments conducted on three multi-label long-tailed medical image datasets, including the Axial Spondyloarthritis Dataset, NIH Chest X-ray 14 Dataset, and ODIR-5K Dataset, have demonstrated that our approach achieves state-of-the-art performance, showcasing its effectiveness in handling the complexities associated with multi-label and long-tailed distributions in medical image recognition.
Primary Subject Area: [Experience] Multimedia Applications
Secondary Subject Area: [Experience] Multimedia Applications
Relevance To Conference: This study delivers a substantial advancement in multimodal medical image processing by introducing a robust methodology for long-tail multi-label classification that synergistically fuses label semantics with deep visual features. Our innovative approach enhances classification in long-tail multi-label medical datasets, addressing the critical challenge of underrepresented category recognition. The integration of textual and visual modalities—using Transformer-based label decoupling and Gaussian distribution-based feature reconstruction—facilitates a significant uplift in both model robustness and accuracy. Our model's adept handling of rare and prevalent diseases within medical imaging propels the capabilities of multimodal systems. Demonstrated effectiveness on benchmark medical datasets indicates our method's potential for transformative impact in medical imaging applications.
Supplementary Material: zip
Submission Number: 1226
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview