Latent-Space Reinforcement Learning for Image Segmentation

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: image segmentation
Abstract: Policy-gradient reinforcement learning is a theoretically grounded and empirically effective algorithm for boosting the performance of LLMs and MLLMs, while its adaptation to conventional vision tasks such as dense prediction remains marginal. In response, this work introduces a latent-space reinforcement learning framework designed for image segmentation with task-specific model architectures, aiming to investigate whether the advantages conferred by reinforcement learning in LLMs and MLLMs, including improved predictive performance, mitigation of forgetting and enhanced generalization, can be effectively transferred to conventional dense prediction tasks. The designed framework is instantiated with a latent-space policy network for feature representation modulation, a stabilized advantage formulation that underpins reliable policy updates, a segmentation-aligned reward formulation that quantifies segmentation quality, and a hybrid loss to enhance training stability and learning efficiency. The effectiveness of our proposed framework is validated through integration with widely used semantic segmentation models and empirical evaluation under cross-domain and continual learning settings. Across diverse and challenging benchmarks, the proposed framework delivers consistent performance gains, demonstrating its practical efficacy and highlighting its potential for broader application in future research.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 7283
Loading