Abstract: Highlights•This survey delves into CLIP’s potential, exploring its role in image–text alignment and clinical applications.•We discover how CLIP is enhancing tasks like classification, dense prediction, and cross-modal tasks.•While revealing the rapid growth of CLIP-focused studies, we discuss the limitation and future.•This survey offers a holistic understanding of CLIP’s implications, guiding researchers into this area.
External IDs:doi:10.1016/j.media.2025.103551
Loading