Linguistic Image Understanding

23 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: vision-language, text-centric, dataset refinement
TL;DR: A novel, text-centric vision-language framework called Linguistic Image Understanding.
Abstract: We present a novel, text-centric vision-language framework called Linguistic Image Understanding (LIU). It introduces a unique pipeline for image-text processing by transforming images into comprehensive textual descriptions that encapsulate not only comprehensive object semantic details but also the spatial positioning of objects within images, enriching visual grounding ability. Then LIU feeds these descriptions into pretrained large language models to handle vision-language tasks without seeing the image and achieves promising performance on many vision-language tasks with high computational efficiency and enhanced interpretability. Experimental results show that LIU exhibits a unique potential to refine and elevate the quality of existing vision-language pre-train datasets, resulting in significantly improved Image-Text Matching scores. Accordingly, vision-language models fine-tuned on these refined datasets have also shown performance improvement across a broad spectrum of vision-language tasks. Our work points to a promising future where the amalgamation of advanced language models and semantic-rich textual descriptions can drive the evolution of more efficient and adaptable vision-language models.
Supplementary Material: zip
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7519
Loading