Seg-Agent: Improving Language-Guided Segmentation via Explicit Chain-of-Reasoning Construction

Chao Hao; Jun Xu; Ji Du; shuo Ye; Yong Xu; Ziyue Qiao; Xiaodong Cun; Guangcong Wang; Xubin Zheng; Zitong YU

Seg-Agent: Improving Language-Guided Segmentation via Explicit Chain-of-Reasoning Construction

Chao Hao, Jun Xu, Ji Du, shuo Ye, Yong Xu, Ziyue Qiao, Xiaodong Cun, Guangcong Wang, Xubin Zheng, Zitong YU

16 Sept 2025 (modified: 13 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Language-Guided Segmentation, MLLMs, SAM

TL;DR: In this paper, we propose Seg-Agent, a completely training-free language-guided segmentation method.

Abstract: Language-guided segmentation breaks through the scope limitations of traditional semantic segmentation, enabling models to segment any target region in an image based on user instructions. Existing methods are typically two-stage frameworks: they first employ multimodal large language models (MLLMs) to understand the textual instruction and generate visual prompts from the image, and then use foundational segmentation models such as SAM to produce high-quality masks. However, due to the limited spatial grounding capability of the base models, they usually require training on large-scale datasets to achieve improved segmentation accuracy. In this paper, we propose Seg-Agent, a completely training-free language-guided segmentation method. By constructing an explicit reasoning chain: generation, selection, and refinement, Seg-Agent achieves performance comparable to training-based approaches. Additionally, to evaluate the generalization ability of Seg-Agent, we collect a diverse dataset covering various language-guided segmentation scenarios, named Various-LangSeg. Extensive experiments demonstrate the effectiveness of our proposed method. The code and dataset will be made publicly available.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 6893

Loading