Vi2ACT:Video-enhanced Cross-modal Co-learning with Representation Conditional Discriminator for Few-shot Human Activity Recognition
Abstract: Human Activity Recognition (HAR) as an emerging research field has attracted widespread academic attention due to its wide range of practical applications in areas such as healthcare, environmental monitoring, and sports training. Given the high cost of annotating sensor data, many unsupervised and semi-supervised methods have been applied to HAR to alleviate the problem of limited data. In this paper, we propose a novel video-enhanced cross-modal collaborative learning method, Vi2ACT, to address the issue of few-shot HAR. We introduce a new data augmentation approach that utilizes a text-to-video generation model to generate class-related videos. Subsequently, a large quantity of video semantic representations are obtained through fine-tuning the video encoder for cross-modal co-learning. Furthermore, to effectively align video semantic representations and time series representations, we enhance HAR at the representation-level using conditional Generative Adversarial Nets (cGAN). We design a novel Representation Conditional Discriminator that is trained to assess samples as originating from video representations rather than those generated by the time series encoder as accurately as possible. We conduct extensive experiments on four commonly used HAR datasets. The experimental results demonstrate that our method outperforms other baseline models in all few-shot scenarios.
Primary Subject Area: [Content] Media Interpretation
Secondary Subject Area: [Content] Multimodal Fusion
Relevance To Conference: We proposed a novel cross-modal co-learning approach to strengthen few-shot human activity recognition. We propose a novel data augmentation approach that leverages a text-to-video generation model to generate label-related videos and we also design a representation conditional discriminator that guides the time series encoder to extract class-specific semantic representations through specific class conditions.
Submission Number: 5418
Loading