SynthVLM: Towards High-Quality and Efficient Synthesis of Image-Caption Datasets for Vision-Language Models | OpenReview

SynthVLM: Towards High-Quality and Efficient Synthesis of Image-Caption Datasets for Vision-Language Models

Open Webpage

Zheng Liu, Hao Liang, Bozhou Li, Wentao Xiong, Chong Chen, Conghui He, Wentao Zhang, Bin Cui

Published: 27 Oct 2025, Last Modified: 16 Mar 2026CrossrefEveryoneRevisionsCC BY-SA 4.0

External IDs:doi:10.1145/3746027.3758222

Loading