Learning discriminative action and context representations for action recognition in still images

Miao Xin, Hong Zhang, Ding Yuan, Mingui Sun

Published: 2017, Last Modified: 14 May 2023ICME 2017Readers: Everyone

Abstract: Action recognition in still images is a challenging task in computer vision. Recent successes in deep feature-learning advance this research, employing robust and rich-semantic feature representation. However, the issue that recognition fails when two action images share similar contexts is long-standing. In this paper, we employ metric learning method to address within-class and between-class confusions in action recognition. We propose a novel loss function, named composite-triplet loss. Supervised by this loss function, our method directly learns a similarity function from data. Employing a customized human action and context detection network, we obtain highly discriminative action image embeddings, which can be used in action image recognition and other tasks. Our approach is evaluated on the still-image action recognition task and the image caption generation task. On the PASCAL VOC dataset, our approach outperforms state-of-the-art methods, achieving 90.6% mean AP.

0 Replies