Zoom-shot: Fast, efficient and unsupervised zero-shot knowledge transfer from CLIP to vision encoders
Abstract: Highlights•Efficiently transfers zero-shot capabilities from CLIP to pre-trained vision encoders.•Improves performance over previous SOTA work in domain.•Training data diversity/coverage improves mapping quality and zero-shot performance.•Increased data coverage achieved through the use of tailored loss functions.•Training data is entirely unlabelled and unpaired image and text data.
External IDs:dblp:journals/pr/ShipardWTXF26
Loading