VLM-R3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought.

Chaoya Jiang, Yongrui Heng, Wei Ye 0004, Han Yang, Haiyang Xu 0001, Ming Yan 0008, Ji Zhang 0011, Fei Huang 0002, Shikun Zhang

22 Jan 2026CoRR 2025EveryoneCC BY-SA 4.0
Loading