Zoom-shot: Fast, efficient and unsupervised zero-shot knowledge transfer from CLIP to vision encoders

Published: 2026, Last Modified: 15 Jan 2026Pattern Recognit. 2026EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•Efficiently transfers zero-shot capabilities from CLIP to pre-trained vision encoders.•Improves performance over previous SOTA work in domain.•Training data diversity/coverage improves mapping quality and zero-shot performance.•Increased data coverage achieved through the use of tailored loss functions.•Training data is entirely unlabelled and unpaired image and text data.
Loading