CoT-Seg: Rethinking Reasoning Segmentation with Chain of Thoughts and Auto Correction

03 Sept 2025 (modified: 13 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multimodal Large Language Model, Reasoning Segmentation
Abstract: Reasoning segmentation is an emerging vision-language task that requires generating a segmentation mask from implicit and often ambiguous language queries,enabled by recent advances in Multimodal Large Language Models (MLLMs).However, state-of-the-art training-based approaches often fail in challenging cases that demand higher-level reasoning or external knowledge. In this work, we introduce CoT-Seg, a training-free framework that rethinks reasoning segmentation by combining chain-of-thought reasoning with self-correction. Instead of finetuning, CoT-Seg leverages the inherent reasoning ability of pre-trained MLLMs (e.g., GPT-4o) to decompose queries into meta-instructions, extract fine-grained semantics from images, and identify target objects even under implicit or complex prompts. Crucially, CoT-Seg incorporates a self-correction stage: the model evaluates its own segmentation against the original query and reasoning trace, identifies mismatches, and iteratively refines the mask. This tight integration of reasoning and correction significantly improves reliability and robustness, especially in ambiguous or error-prone cases. Furthermore, we extend CoT-Seg with retrieval- augmented reasoning, enabling the system to access external knowledge when the input lacks sufficient information, further enhancing segmentation accuracy. Extensive experiments on ReasonSeg and RefCOCO demonstrate that CoT-Seg consistently outperforms existing baselines while remaining training-free. Our results highlight that combining chain-of-thought reasoning, self-correction, and retrieval augmentation offers a powerful paradigm for advancing reasoning-driven segmentation.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 1766
Loading