Pooling Transformer for Detection of Risk Events in In-The-Wild Video Ego Data

Rupayan Mallick, Jenny Benois-Pineau, Akka Zemmari, Thinhinane Yebda, Marion Pech, Hélène Amieva, Laura Middleton

Published: 2022, Last Modified: 12 May 2023ICPR 2022Readers: Everyone

Abstract: The paper proposes a video transformer architecture for detection of risk events on frail adults with ego video monitoring data. First we introduce an extended taxonomy for risk events, and then we propose a transformer based video recognition model for detection of these risk events. The proposed transformer architecture consists of separable attention for spatial and temporal data. We also introduce a pooling operation on the temporal video data by learning of their importance. The experiments have been conducted on visual data of in-the-wild recorded BIRDS dataset and on Kinetics-400 for benchmarking. The use of the pooling operation in transformers gives an increment of 3% on BIRDS dataset.

0 Replies