Keywords: Unsupervised learning, Semi-supervised learning, Domain adaptation, Knowledge distillation, Graph learning
Abstract: In this paper, we explore $\textbf{how to enhance student network performance in knowledge distillation (KD) for domain adaptation (DA)}$. We identify two key factors impacting student performance under domain shift: $\textbf{(1) the capability of the teacher network}$ and $\textbf{(2) the effectiveness of the knowledge distillation strategy}$.
For the first factor, we integrate a Vision Transformer (ViT) as the feature extractor and our proposed Category-level Aggregation (CA) module as the classifier to construct the ViT+CA teacher network. This architecture leverages ViT's ability to capture detailed representations of individual images. Additionally, the CA module employs the message-passing mechanism of a graph convolutional network to promote intra-class relations and mitigate domain shift by grouping samples with similar class information.
For the second factor, we leverage pseudo labels generated by the ViT+CA teacher to guide the gradient updates of the student network's parameters, aligning the student's behavior with that of the teacher. To optimize for efficient inference and reduced computational cost, we use a convolutional neural network (CNN) for feature extraction and a multilayer perceptron (MLP) as the classifier to build the CNN+MLP student network. Extensive experiments on various DA datasets demonstrate that our method significantly surpasses current state-of-the-art approaches. Our code will be available soon.
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 8559
Loading