Pooling Transformer for Detection of Risk Events in In-The-Wild Video Ego DataDownload PDFOpen Website

Published: 01 Jan 2022, Last Modified: 12 May 2023ICPR 2022Readers: Everyone
Abstract: The paper proposes a video transformer architecture for detection of risk events on frail adults with ego video monitoring data. First we introduce an extended taxonomy for risk events, and then we propose a transformer based video recognition model for detection of these risk events. The proposed transformer architecture consists of separable attention for spatial and temporal data. We also introduce a pooling operation on the temporal video data by learning of their importance. The experiments have been conducted on visual data of in-the-wild recorded BIRDS dataset and on Kinetics-400 for benchmarking. The use of the pooling operation in transformers gives an increment of 3% on BIRDS dataset.
0 Replies

Loading