TS-ILM:Class Incremental Learning for Online Action Detection

Published: 20 Jul 2024, Last Modified: 01 Aug 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Online action detection aims to identify ongoing actions within untrimmed video streams, with extensive applications in real-life scenarios. However, in practical applications, video frames are received sequentially over time and new action categories continually emerge, giving rise to the challenge of catastrophic forgetting - a problem that remains inadequately explored. Generally, in the field of video understanding, researchers address catastrophic forgetting through class-incremental learning. Nevertheless, online action detection is based solely on historical observations, thus demanding higher temporal modeling capabilities for class-incremental learning methods. In this paper, we conceptualize this task as Class-Incremental Online Action Detection (CIOAD) and propose a novel framework, TS-ILM, to address it. Specifically, TS-ILM consists of two components: task-level temporal pattern extractor and temporal-sensitive exemplar selector. The former extracts the temporal patterns of actions in different tasks and saves them, allowing the data to be comprehensively observed on a temporal level before it is input into the backbone. The latter selects a set of frames with the highest causal relevance and minimum information redundancy for subsequent replay, enabling the model to learn the temporal information of previous tasks more effectively. We benchmark our approach against SoTA class-incremental learning methods applied in the image and video domains on THUMOS'14 and TVSeries datasets. Our method outperforms the previous approaches.
Primary Subject Area: [Content] Media Interpretation
Secondary Subject Area: [Content] Media Interpretation, [Content] Vision and Language
Relevance To Conference: Our work primarily investigates the application of class-incremental learning in the context of online action detection, introducing a new task known as Class-Incremental Online Action Detection (CIOAD). Furthermore, we propose an innovative framework, the Time-sensitive Incremental Learning Method (TS-ILM), to address this challenge. Specifically, it extracts and preserves the temporal patterns of actions on different tasks and selects a set of frames with the most significant causal relevance and the least information redundancy on a temporal level for exemplar replay, thereby retaining the temporal information of previous tasks. Then, We benchmark our approach against state-of-the-art class-incremental learning methods applied in the image and video domains on the THUMOS'14 and TVSeries datasets. Our method significantly outperforms the previous approaches. Our work addresses the issue of catastrophic forgetting caused by the continuous emergence of video content, allowing the model to learn continuously and moderately advancing the development of video understanding. It also represents an innovative approach to media content interpretation, making a modest contribution to the multimedia field, especially in the aspect of media explanation.
Supplementary Material: zip
Submission Number: 4065
Loading