Zero-shot Learning Using Multimodal Descriptions

Utkarsh Mall, Bharath Hariharan, Kavita Bala

2022 (modified: 06 Nov 2022)CVPR Workshops 2022Readers: Everyone

Abstract: Zero-shot learning (ZSL) tackles the problem of recognizing unseen classes using only semantic descriptions, e.g., attributes. Current zero-shot learning techniques all assume that a single vector of attributes suffices to describe each category. We show that this assumption is incorrect. Many classes in real world problems have multiple modes of appearance: male and female birds vary in appearance, for instance. Domain experts know this and can provide attribute descriptions of the chief modes of appearance for each class. Motivated by this, we propose the task of multimodal zero-shot learning, where the learner must learn from these multimodal attribute descriptions. We present a technique for addressing this problem of multimodal ZSL that outperforms the unimodal counterpart significantly. We posit that multimodal ZSL is more practical for real-world problems where complex intra-class variation is common.

0 Replies