Unseen Action Recognition with Unpaired Adversarial Multimodal LearningDownload PDF

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Blind SubmissionReaders: Everyone
Abstract: In this paper, we present a method to learn a joint multimodal representation space that allows for the recognition of unseen activities in videos. We compare the effect of placing various constraints on the embedding space using paired text and video data. Additionally, we propose a method to improve the joint embedding space using an adversarial formulation with unpaired text and video data. In addition to testing on publicly available datasets, we introduce a new, large-scale text/video dataset. We experimentally confirm that learning such shared embedding space benefits three difficult tasks (i) zero-shot activity classification, (ii) unsupervised activity discovery, and (iii) unseen activity captioning.
Data: [ActivityNet](https://paperswithcode.com/dataset/activitynet), [Charades](https://paperswithcode.com/dataset/charades), [HMDB51](https://paperswithcode.com/dataset/hmdb51), [MLB-YouTube Dataset](https://paperswithcode.com/dataset/mlb-youtube-dataset), [UCF101](https://paperswithcode.com/dataset/ucf101)
7 Replies

Loading