Enhancing vision-language contrastive representation learning using domain knowledge

Published: 2025, Last Modified: 10 Nov 2025Comput. Vis. Image Underst. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•Rich semantics can be developed in representation learning by leveraging.•The inter-data relationships can be modeled using external knowledge.•Additional cross-modal alignment leads to better visual understanding.•Fine-grained inter-data similarity can serve as soft targets for cross-modal alignment.
Loading