Abstract: Highlights•Multimodality provides new ideas for improving zero-shot text classification.•Image–text matching method can improve efficiency for label transform.•Analyzed shallow and deep fusion methods and illustrated their differences.•CLIPMulti achieves competitive performance on zero-shot text classification tasks.
Loading