Cross-Modal Self-Supervised Feature Extraction for Anomaly Detection in Human Monitoring

Jose Alejandro Avellaneda; Tetsu Matsukawa; Einoshin Suzuki

Cross-Modal Self-Supervised Feature Extraction for Anomaly Detection in Human Monitoring

Jose Alejandro Avellaneda, Tetsu Matsukawa, Einoshin Suzuki

Published: 01 Jan 2023, Last Modified: 13 Nov 2024CASE 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper proposes to extract cross-modal self-supervised features to detect anomalies in human monitoring. Our previous works that use deep captioning in addition to monitoring images were successful. However, their use of unimodally trained image and text features shows deficiencies in capturing contextual information across the modalities. We devise a self-supervised method that creates cross-modal features by maximizing the mutual information between both modalities in a common subspace. It allows capturing different complex distributions between modalities, improving the detection performance of clustering methods. Extensive experimental results show improvements in both AUC and AUPRC scores when compared to the best baselines on two real-world datasets. The AUC has improved from 0.895 to 0.969, and from 0.97 to 0.98. The AUPRC has improved from 0.681 to 0.850, and from 0.840 to 0.894.

Loading