Abstract: Lung-infected area segmentation is crucial for assessing the severity of lung diseases. However, existing image-text multi-modal methods typically rely on labour-intensive annotations for model training, posing challenges regarding time and expertise. To address this issue, we propose a novel attribute knowledge guided framework for unsupervised lung-infected area segmentation (AKGNet), which achieves segmentation solely based on image-text data without any mask annotation. AKGNet conducts text attribute knowledge learning, attribute-image cross-attention fusion, and high-confidence based pseudo-label exploration simultaneously. It learns statistical information and captures spatial correlations between image and text attributes in the embedding space, iteratively refining the mask to enhance segmentation. Specifically, we introduce a text attribute knowledge learning module by extracting attribute knowledge and deploying it for feature representation learning, enabling the model to learn statistical information and adapt to different attributes. Moreover, we devise an attribute-image cross-attention module by exploiting the correlations between attributes and images in the embedding space to capture spatial dependency information, thus selectively focusing on relevant regions. Finally, a self-training mask improvement process is employed by generating pseudo-labels using high-confidence predictions and enhancing the mask and segmentation iteratively. Experimental results on a benchmark medical image dataset demonstrate the superior performance of our proposed method compared to state-of-the-art segmentation techniques in unsupervised scenarios.
Loading