SynthVLM: Towards High-Quality and Efficient Synthesis of Image-Caption Datasets for Vision-Language Models

Zheng Liu, Hao Liang, Bozhou Li, Wentao Xiong, Chong Chen, Conghui He, Wentao Zhang, Bin Cui

Published: 27 Oct 2025, Last Modified: 16 Mar 2026CrossrefEveryoneRevisionsCC BY-SA 4.0
Loading