Enhancing Vision-Language Models Incorporating TSK Fuzzy System for Domain Adaptation

Published: 01 Jan 2024, Last Modified: 27 Sept 2024FUZZ 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Unsupervised Domain Adaptation (UDA) addresses the challenge of applying knowledge from a labeled source domain to tasks within an unlabeled target domain, where each domain exhibits unique data distributions. To tackle significant uncertainty in the unlabeled target domain, fuzzy domain adaptation methods have been devised. However, existing methods highly focus on utilizing visual information, overlooking the potential textual information within class labels. To this end, vision-language models have been developed to exploit information from both visual and textual branches. Nonetheless, adapting vision-language models in UDA encounters several critical issues: (1) current methods tend to optimize only one branch, risking convergence to local optima, and (2) insufficient exploitation of cross-domain relationships. To address these issues and advance UDA, this paper proposes an innovative method, called VLM-TSK-DA which enhances vision-language models by integrating Takagi-Sugeno-Kang (TSK) fuzzy systems. The TSK fuzzy system is employed as an image adapter to effectively manage uncertainty during the transfer process, which is combined with image features in a residual manner for performance optimization. Our method integrates the TSK fuzzy system with prompt learning, ensuring simultaneous updates of both visual and textual branches to achieve a global optimum. Furthermore, we introduce a fuzzy c- means clustering loss function, designed to leverage inherent cross-domain relationships, significantly reducing the distance between the target domain data and source cluster centers with high membership values. Thereby effectively minimizing the distribution discrepancy. Empirical evaluations on real-world datasets validate the efficacy of the proposed method.
Loading