Abstract: Large-scale unsupervised semantic segmentation (LUSS) is a sophisticated process that aims to segment similar areas within an image without relying on labeled training data. While existing methodologies have made substantial progress in this area, there is ample scope for enhancement. We thus introduce the PASS-SAM model, a comprehensive solution that amalgamates the benefits of various models to improve segmentation performance. Specifically, we enhance a baseline model utilizing self-attention and external attention modules. In the fine-tuning phase, we make use of conditional random fields (CRF) and the segment anything model (SAM) to refine and retrain the baseline model. During inferencing, we employ a model ensemble to blend predictions from different models, thereby enhancing segmentation accuracy. This approach secured first place in the LUSS track of the Third Jittor Artificial Intelligence Challenge. Our model, which makes use of the Jittor framework, is publicly available at https://github.com/PGSmall/jittor-PGSmall-LUSS.
External IDs:dblp:journals/cvm/TangCPW25
Loading