Abstract: Most existing weakly-supervised segmentation methods rely on class activation maps (CAM) to generate pseudo-labels for training segmentation models. However, CAM has been criticized for highlighting only the most discriminative parts of the object, leading to poor quality of pseudo-labels. Although some recent methods have attempted to extend CAM to cover more areas, the fundamental problem still needs to be solved. We believe this problem is due to the huge gap between image-level labels and pixel-level predictions and that additional information must be introduced to address this issue. Thus, we propose a text-prompting-based weakly supervised segmentation method (TPRO), which uses text to introduce additional information. TPRO employs a vision and label encoder to generate a similarity map for each image, which serves as our localization map. Pathological knowledge is gathered from the internet and embedded as knowledge features, which are used to guide the image features through a knowledge attention module. Additionally, we employ a deep supervision strategy to utilize the network’s shallow information fully. Our approach outperforms other weakly supervised segmentation methods on benchmark datasets LUAD-HistoSeg and BCSS-WSSS datasets, setting a new state of the art. Code is available at: https://github.com/zhangst431/TPRO .
0 Replies
Loading