Medical Transformer With Mix Mask Generation for Thorax Disease Classification

Ziyi Liu, Zengmao Wang, Bo Du

Published: 01 Jan 2025, Last Modified: 04 Nov 2025IEEE Trans. Multim. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Chest X-ray images have been highly involved in clinical diagnosis and treatment planning for thoracic disease. The process of medical images has attracted great attention in the machine learning community. However, the labeled medical images are limited and the regions of lesions are usually much smaller in the image. Most of the existing methods are prone to learning the spurious correlation for classification, resulting in poor generalization. In this paper, we propose a medical generation transformer network based on self-supervised learning and the adversarial strategy to capture the discriminative label-relevant regions with lesions in the images by extending the Chest X-ray images. In the proposed method, we first localize the label-relevant regions in each transformer layer. Then we keep the label-relevant regions to mask the image and construct the masked image with self-supervised learning. Thus we can generate more images to fine-tune the classification network with masked images that keep the label-relevant regions. Since the generated images are usually noisy to fine-tune the classification network, we adopt the adversarial probabilities to weight the importance of each generated image for training. Experimental results on two large-scale and popular chest X-ray datasets show that the proposed method can efficiently leverage the location of lesions to improve the performance of classification.

External IDs:dblp:journals/tmm/LiuWD25