Abstract: In the field of computer vision, semantic segmentation has consistently garnered significant attention. To enhance the performance of diffusion models in the domain of weak supervision, this paper proposes an improved weakly supervised semantic segmentation method, termed WSSS-DM+. The WSSS-DM+ method incorporates a DC module into the original model architecture based on the cross-attention mechanism and employs a variety of model optimization strategies, which collectively enhance the quality of the generated masks. Additionally, this method more accurately establishes associations between text and image regions, thereby achieving a visual explanation of the text-image diffusion model. Experimental investigations have demonstrated the generalizability of this method across different texts. Furthermore, experimental results indicate that, compared to the WSSS-DM method, the new approach effectively addresses the issue of coarse masks, with the mIoU and mAcc metrics improving by 4.7 and 5.8, respectively, thus confirming the effectiveness of the proposed enhancements.
External IDs:doi:10.1007/978-981-96-6948-6_6
Loading