Unsupervised Domain Adaptation within Deep Foundation Latent Spaces

ICLR 2024 Workshop ME-FoMo Submission34 Authors

Published: 04 Mar 2024, Last Modified: 04 May 2024ME-FoMo 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: domain adaptation; vision transformers
TL;DR: ViT-based models substantially increase the performance on the challenge of unsupervised domain adaptation without specialised finetuning.
Abstract: The vision transformer-based foundation models, such as ViT or Dino-V2, are aimed at solving problems with little or no finetuning of features. Using a setting of prototypical networks, we analyse to what extent such foundation models can solve unsupervised domain adaptation without finetuning over the source or target domain. Through quantitative analysis, as well as qualitative interpretations of decision making, we demonstrate that the suggested method can improve upon existing baselines, as well as showcase the limitations of such approach yet to be solved. The code is available at: https://github.com/lira-centre/vit_uda/
Submission Number: 34
Loading