Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts

Yan Zeng, Xinsong Zhang, Hang Li

2022 (modified: 24 Apr 2023)ICML 2022Readers: Everyone

Abstract: Most existing methods in vision language pre-training rely on object-centric features extracted through object detection and make fine-grained alignments between the extracted features and texts. I...

0 Replies