Learning Hierarchical Hyperbolic Embeddings for Compositional Zero-Shot Learning

15 Sept 2025 (modified: 13 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Compositional Zero-Shot Learning, Hierarchical Learning, Hyperbolic Representation
Abstract: Compositional zero-shot learning (CZSL) aims to recognize unseen state-object compositions by generalizing from a training set of their primitives (state and object). Current methods often overlook the rich hierarchical structures, such as the semantic hierarchy of primitives (e.g., apple $\subset$ fruit) and conceptual hierarchy between primitives and compositions (e.g., sliced apple $\subset$ apple). A few recent efforts have shown effectiveness in modeling these hierarchies through loss regularization within Euclidean space. In this paper, we argue that they fail to scale to the large-scale taxonomies required for real-world CZSL: the space's polynomial volume growth in flat geometry cannot match the exponential structure, impairing generalization capacity. To this end, we propose $\text{H}^2$em, a new framework that learns Hierarchical Hyperbolic EMbeddings for CZSL. $\text{H}^2$em leverages the unique properties of hyperbolic geometry, a space naturally suited for embedding tree-like structures with low distortion. However, a naive hyperbolic mapping may suffer from hierarchical collapse and poor fine-grained discrimination. We further design two learning objectives to structure this space: a taxonomic entailment loss that uses hyperbolic entailment cones to enforce the predefined hierarchies, and a discriminative alignment loss with hard negative mining to establish a large geodesic distance between semantically similar compositions. Extensive ablations on three benchmarks have demonstrated that $\text{H}^2$em establishes a new state-of-the-art in both closed-world and open-world scenarios. Our codes will be released.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 5366
Loading