CLVIN: Complete language-vision interaction network for visual question answering

Published: 01 Jan 2023, Last Modified: 21 May 2025Knowl. Based Syst. 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•Present that incomplete interactions limit rationality for token distribution.•Design a quadratic E-D mode model CLVIN to realize reasonable token distribution.•Propose CLVIN-c to implement further improvements in model size and performance.•Realize significant or comparable performance gain compared to some existing SOTAs.
Loading