Robust Probabilistic Unsupervised Segmentation with Uncertainty Modeling

19 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Unsupervised Image Segmentation, Probabilistic Unsupervised Segmentation
TL;DR: We propose an end-to-end probabilistic self-supervised transformer with a novel loss function to improve the stability and robustness of unsupervised semantic segmentation.
Abstract:

Unsupervised semantic segmentation aims to assign a semantic label to each pixel in an image, identifying the object or scene class without any supervision. However, the task becomes particularly difficult due to factors like unclear or overlapping boundaries, intricate object textures, and the presence of multiple objects within the same region. Traditional unsupervised models often suffer from class misalignment and poor spatial coherence, leading to fragmented and imprecise segmentation, often employing postprocessing with Conditional Random Fields (CRFs) to improve their results. Additionally, deterministic models lack the ability to capture prediction uncertainty, making their outputs particularly prone to errors in ambiguous regions. To address these issues, we propose a probabilistic unsupervised semantic segmentation framework that enhances the robustness and accuracy of segmentation by refining predictions through uncertainty modeling and spatial smoothing techniques. We also introduce a novel loss function that encourages the model to focus on learning similarities within pixels by leveraging feature information from pre-trained vision transformer backbones. We also provide theoretical analyses of our proposed loss function, highlighting its favorable properties in relation to the optimization of our models. Our method demonstrates superior accuracy and calibration, outperforming various baselines across multiple unsupervised semantic segmentation benchmarks including COCO, Potsdam, and Cityscapes. In conclusion, our framework offers a foundation for more reliable, uncertainty-aware segmentation models, advancing research in unsupervised semantic segmentation.

Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1854
Loading