CLIPMulti: Explore the performance of multimodal enhanced CLIP for zero-shot text classification

Published: 01 Jan 2025, Last Modified: 26 Jul 2025Comput. Speech Lang. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•Multimodality provides new ideas for improving zero-shot text classification.•Image–text matching method can improve efficiency for label transform.•Analyzed shallow and deep fusion methods and illustrated their differences.•CLIPMulti achieves competitive performance on zero-shot text classification tasks.
Loading