Multi-Stage Fusion for Event-based Multimodal Tracker

Published: 01 Jan 2024, Last Modified: 05 Mar 2025ICME 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Event cameras are bio-inspired sensors with high dynamic range and time resolution, which are favorable properties for visual object tracking. There are already some methods that fuse the event modality and RGB modality with cross-domain feature integrator to achieve improved tracking performance. Researchers have developed some architectures for event modality processing or fusion, successfully boosting the tracking performance. In this work, we design a RGB-E tracker with multi-stage fusion. In the early stage, frames are enhanced with aid of events to mitigate blur or under/over-exposure degradation. During the middle stage, we utilize a fusion module for feature-level integration. At the late stage, we carry out decision-level fusion by predicting tracking boxes based on frame features, event features, and fused features, and the one with highest score is taken as the final estimation. Our design thoroughly integrate information from various levels, allowing each modality to contribute to the tracking process as much as possible. Extensive experiments demonstrate that the proposed method performs favorably against state-of-the-art RGB-E trackers in both accuracy and efficiency.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview