Abstract: Zero-shot learning (ZSL) aims to recognize novel classes
by transferring semantic knowledge from seen classes to
unseen classes. Since semantic knowledge is built on attributes shared between different classes, which are highly
local, strong prior for localization of object attribute is beneficial for visual-semantic embedding. Interestingly, when
recognizing unseen images, human would also automatically gaze at regions with certain semantic clue. Therefore, we introduce a novel goal-oriented gaze estimation
module (GEM) to improve the discriminative attribute localization based on the class-level attributes for ZSL. We
aim to predict the actual human gaze location to get the visual attention regions for recognizing a novel object guided
by attribute description. Specifically, the task-dependent
attention is learned with the goal-oriented GEM, and the
global image features are simultaneously optimized with
the regression of local attribute features. Experiments on
three ZSL benchmarks, i.e., CUB, SUN and AWA2, show
the superiority or competitiveness of our proposed method
against the state-of-the-art ZSL methods. The ablation
analysis on real gaze data CUB-VWSW also validates the
benefits and accuracy of our gaze estimation module. This
work implies the promising benefits of collecting human
gaze dataset and automatic gaze estimation algorithms on
high-level computer vision tasks. The code is available at
https://github.com/osierboy/GEM-ZSL.
0 Replies
Loading