Distilling efficient Vision Transformers from CNNs for semantic segmentation

Published: 01 Jan 2025, Last Modified: 12 May 2025Pattern Recognit. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•Novel C2VKD framework for learning compact ViT student from CNN teacher.•We present VLFD and PDD modules to distill visual and linguistic features.•C2VKD achieves state-of-the-art performance on three segmentation datasets.
Loading