UMZero: A Unified CNN-Mamba Framework for Zero-Shot learning

20 Sept 2025 (modified: 01 Feb 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Zero-shot learning (ZSL);State space models;Cross-modal interactions; Prototype Contrastive
Abstract: In zero-shot learning (ZSL), accurately recognizing unseen classes relies heavily on deeply understanding both local details and global correlations between visual and semantic modalities. While prior methods have leveraged CNNs for local feature extraction or utilized Transformers and Mamba models to capture global contextual information, these approaches often lack an integrated mechanism to jointly model both aspects. This limitation hampers their ability to fully exploit complex cross-modal interactions. To overcome this challenge, we propose UMZero, a novel hybrid ZSL framework that synergistically combines a pretrained CNN with a state space model (SSM), effectively uniting fine-grained local feature extraction with long-range dependency modeling. UMZero is composed of three core modules: the High-order Global Aggregator (HGA), the Mamba Interaction Module (MIM), and a prototype learning unit. Specifically, the HGA enhances feature expressiveness by capturing high-order statistical dependencies across channels; the MIM enables deep cross-modal fusion and alignment by jointly modeling local-global interactions and supporting bidirectional information flow between modalities; and the prototype learning module constructs a semantically structured embedding space, promoting compact intra-class and discriminative inter-class representations. Extensive evaluations on several well-established ZSL datasets confirm that UMZero surpasses current state-of-the-art methods, delivering superior performance and demonstrating robust generalization capabilities. The source code will be publicly released upon paper acceptance.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 23305
Loading