Abstract: Highlights•Novel C2VKD framework for learning compact ViT student from CNN teacher.•We present VLFD and PDD modules to distill visual and linguistic features.•C2VKD achieves state-of-the-art performance on three segmentation datasets.
Loading