Using Language to Extend to Unseen DomainsDownload PDF

Published: 01 Feb 2023, Last Modified: 17 Sept 2023ICLR 2023 notable top 25%Readers: Everyone
Keywords: vision and language, robust training, domain adaptation
TL;DR: Transforming multimodal embeddings with language improves accuracy on an unseen domain.
Abstract: It is expensive to collect training data for every possible domain that a vision model may encounter when deployed. We instead consider how simply $\textit{verbalizing}$ the training domain (e.g.``photos of birds'') as well as domains we want to extend to but do not have data for (e.g.``paintings of birds'') can improve robustness. Using a multimodal model with a joint image and language embedding space, our method $\textit{LADS}$ learns a transformation of the image embeddings from the source domain to each target domain, while preserving task relevant information. Without using any images from the target domain, we show that over the $\textit{extended}$ domain containing both source and target, $\textit{LADS}$ outperforms standard fine-tuning and ensemble approaches over a suite of 4 benchmarks targeting domain adaptation and dataset bias.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Supplementary Material: zip
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](
9 Replies