TS-ILM: Class Incremental Learning for Online Action Detection

Published: 01 Jan 2024, Last Modified: 17 Feb 2025ACM Multimedia 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Online action detection aims to identify ongoing actions within untrimmed video streams, with extensive applications in real-life scenarios. However, in practical applications, video frames are received sequentially over time and new action categories continually emerge, giving rise to the challenge of catastrophic forgetting - a problem that remains inadequately explored. Generally, in the field of video understanding, researchers address catastrophic forgetting through class-incremental learning. Nevertheless, online action detection is based solely on historical observations, thus demanding higher temporal modeling capabilities for class-incremental learning methods. In this paper, we conceptualize this task as Class-Incremental Online Action Detection (CIOAD) and propose a novel framework, TS-ILM, to address it. Specifically, TS-ILM consists of two components: task-level temporal pattern extractor and temporal-sensitive exemplar selector. The former extracts the temporal patterns of actions in different tasks and saves them, allowing the data to be comprehensively observed on a temporal level before it is input into the backbone. The latter selects a set of frames with the highest causal relevance and minimum information redundancy for subsequent replay, enabling the model to learn the temporal information of previous tasks more effectively. We benchmark our approach against SoTA class-incremental learning methods applied in the image and video domains on THUMOS'14 and TVSeries datasets. Our method outperforms the previous approaches.
Loading