GIT: A Generative Image-to-text Transformer for Vision and LanguageDownload PDFOpen Website

Published: 2022, Last Modified: 05 Nov 2023Trans. Mach. Learn. Res. 2022Readers: Everyone
Abstract: In this paper, we design and train a Generative Image-to-text Transformer, GIT, to unify vision-language tasks such as image/video captioning and question answering. While generative models provide...
0 Replies

Loading