Hierarchical Spatiotemporal Fusion for Event-Visible Object Detection

Sin-Ye Jhong, Hsin-Chun Lin, Tzu-Chi Liu, Kai-Lung Hua, Yung-Yao Chen

Published: 2025, Last Modified: 05 Nov 2025ICRA 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Traditional visible light cameras are prone to performance degradation under varying weather and lighting conditions. To address this challenge, we introduce an eventbased camera and propose a novel hierarchical spatiotemporal fusion approach for event-visible object detection. Our method enhances detection performance by integrating data from both event-based and visible light cameras. We have designed three key modules: The Gated Event Accumulation Representation module (GEAR), the Temporal Feature Selection module (TFS), and the Adaptive Fusion module (AF). GEAR and TFS enhance temporal feature fusion at both image and feature levels, while AF effectively integrates multi-modal features with low computational complexity. Our approach has been trained and validated on the publicly available DSEC-Detection dataset, achieving mAP50 and mAP50-95 scores of 67.2% and 45.6%, respectively, demonstrating superior detection performance and validating the effectiveness of the proposed method.

External IDs:dblp:conf/icra/JhongLLHC25