CRASH: Crash Recognition and Anticipation System Harnessing with Context-Aware and Temporal Focus Attentions

Haicheng Liao; Haoyu Sun; Zhenning Li; Huanming Shen; Chengyue Wang; KaHou Tam; Chunlin Tian; Li Li; Cheng-zhong Xu

CRASH: Crash Recognition and Anticipation System Harnessing with Context-Aware and Temporal Focus Attentions

Haicheng Liao, Haoyu Sun, Zhenning Li, Huanming Shen, Chengyue Wang, KaHou Tam, Chunlin Tian, Li Li, Cheng-zhong Xu

Published: 20 Jul 2024, Last Modified: 06 Aug 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Accurately and promptly predicting accidents among surrounding traffic agents from camera footage is crucial for the safety of autonomous vehicles (AVs). This task presents substantial challenges stemming from the unpredictable nature of traffic accidents, their long-tail distribution, the intricacies of traffic scene dynamics, and the inherently constrained field of vision of onboard cameras. To address these challenges, this study introduces a novel accident anticipation framework for AVs, termed CRASH. It seamlessly integrates five components: object detector, feature extractor, object-aware module, context-aware module, and multi-layer fusion. Specifically, we develop the object-aware module to prioritize high-risk objects in complex and ambiguous environments by calculating the spatial-temporal relationships between traffic agents. In parallel, the context-aware is also devised to extend global visual information from the temporal to the frequency domain using the Fast Fourier Transform (FFT) and capture fine-grained visual features of potential objects and broader context cues within traffic scenes. To capture a wider range of visual cues, we further propose a multi-layer fusion that dynamically computes the temporal dependencies between different scenes and iteratively updates the correlations between different visual features for accurate and timely accident prediction. Evaluated on real-world datasets—Dashcam Accident Dataset (DAD), Car Crash Dataset (CCD), and AnAn Accident Detection (A3D) datasets—our model surpasses existing top baselines in critical evaluation metrics like Average Precision (AP) and mean Time-To-Accident (mTTA). Importantly, its robustness and adaptability are particularly evident in challenging driving scenarios with missing or limited training data, demonstrating significant potential for application in real-world autonomous driving systems.

Primary Subject Area: [Systems] Transport and Delivery

Relevance To Conference: The introduction of Advanced Driver Assistance Systems (ADAS) and Autonomous Vehicles (AVs) marks a significant leap forward in our quest for safer roads. By aiming to predict and prevent traffic accidents before they happen, these technologies are at the forefront of transforming our transportation landscape. This capability is crucial, enabling vehicles to make decisions that avoid collisions and protect passengers. The main contributions of this study as follows: (1) We present a novel context-aware module that extends global interactions into the frequency domain using FFT and introduces context-aware attention blocks to compute fine-grained correlations between nuanced spatial and appearance changes in different objections. Enhanced by the proposed \textbf{multi-layer fusion}, this framework dynamically prioritizes risks in various regions, enriching visual cues for accident anticipation. (2) To realistically simulate the variability and randomness of missing data that is commonly encountered in real-world driving, we augment the renowned DAA, A3D, and CCD datasets with scenarios featuring missing data. This innovation expands the research scope for accident detection models and provides comprehensive benchmarks for evaluating model performance. (3) In benchmark tests conducted on the enhanced DAD, A3D, and CCD datasets, CRASH demonstrates superior performance over state-of-the-art (SOTA) baselines across key metrics, such as Average Precision (AP) and mean Time-To-Accident (mTTA). This showcases its remarkable accuracy and applicability across a variety of challenging scenarios, including those with \textbf{10\%-50\% data-missing} and limited 50\%-75\% training set scenes.

Supplementary Material: zip

Submission Number: 1447

Loading