Bridging the Domain Gap by Clustering-based Image-Text Graph Matching

ICML 2023 Workshop SCIS Submission12 Authors

Published: 20 Jun 2023, Last Modified: 21 Sept 2023SCIS 2023 PosterEveryoneRevisions
Keywords: Domain Generalization, Multimodal
TL;DR: We propose a novel method, which utilizes textual descriptions by aligning them with a clustering-based graph-matching algorithm to train domain-invariant visual representations.
Abstract: Learning domain-invariant representations is important to train a model that can generalize well to unseen domains. To this end, we propose a novel approach that leverages the semantic structures inherent in text descriptions as effective pivot embeddings for domain generalization. Specifically, we utilize graph representations of images and their associated textual descriptions to obtain domain-invariant pivot embeddings that capture the underlying semantic relationships between local images and text descriptors. Our approach involves a clustering-based graph-matching algorithm that matches graph-based image node features into textual graphs. Experimental results show the efficacy of our proposed method in enhancing the generalization ability of the model.
Submission Number: 12
Loading