CLIPMulti: Explore the performance of multimodal enhanced CLIP for zero-shot text classification

Peng Wang, Dagang Li, Xuesi Hu, Yongmei Wang, Youhua Zhang

Published: 2025, Last Modified: 26 Jul 2025Comput. Speech Lang. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•Multimodality provides new ideas for improving zero-shot text classification.•Image–text matching method can improve efficiency for label transform.•Analyzed shallow and deep fusion methods and illustrated their differences.•CLIPMulti achieves competitive performance on zero-shot text classification tasks.