Dancing with Discrepancies: Commonality Specificity Attention GAN for Weakly Supervised Medical Lesion Segmentation

ICLR 2025 Conference Submission544 Authors

13 Sept 2024 (modified: 28 Nov 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: medical image segmentation, weakly supervised segmentation
Abstract: Increasing weakly supervised semantic segmentation methods concentrate on the target segmentation by leveraging solely image-level labels. However, few works notice that a significant gap exists in addressing medical characteristics, which demands massive attention. In this paper, we note: (i) Lesion regions typically exhibit a sharp probability distribution pattern while healthy tissues adhere to an underlying homogeneous distribution, which deviates from typical natural images; (ii) Boundaries of lesion foregrounds and structural backgrounds are blurred; (iii) Similar structures frequently appear within specific organs or tissues, which poses a challenge to concentrating models’ attention on regions of interest instead of the entire image. Thus we propose a Commonality-specificity attention GAN (CoinGAN) to overcome the above challenges, which leverages distribution discrepancies to mine the knowledge underlying images. Specifically, we propose a new form of convolution, contrastive convolution, to utilize the fine-grained perceptual discrepancies of activation sub-maps to enhance the intra-image distribution, making lesion foregrounds (specificity) and structural backgrounds (commonality) boundary-aware. Then a commonality-specificity attention mechanism and the GAN-based loss function are devised to jointly suppress similarity regions between different labels of images and accentuate discrepancy regions between different labels of images. This isolates lesion areas from the structural background. Extensive experiments are conducted on three public benchmarks. Our CoinGAN achieves state-of-the-art performance with the DSC of 71.69%, 84.73%, and 78.32% on QaTa-COV19, ISIC2018, and MoNuSeg datasets, making a significant contribution to the detection of pneumonia, skin disease, and cancer. Furthermore, the visualized results also corroborate the effectiveness of CoinGAN in segmenting medical objects.
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 544
Loading