VEFNet: an Event-RGB Cross Modality Fusion Network for Visual Place Recognition

Ze Huang, Rui Huang, Li Sun, Cheng Zhao, Min Huang, Songzhi Su

Published: 01 Jan 2022, Last Modified: 15 May 2025ICIP 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Visual Place Recognition (VPR) on natural image is challenging due to the illumination variance and seasonal changes. In terms of long-term localization, the emerging event stream cameras are naturally resilient to appearance changes. In this paper, we propose a novel multi-modal network, e.g. VEFNet for VPR by learning location-specific cross RGB-event modality feature representations. Specifically, we firstly extract dense visual features via shared Convolutional Neural Network (CNN) backbone from RGB and event frames separately. Then, two branch features are fed to the cross-modality attention module to establish correspondences between the dual-modality. We also employ a self-attention module to enhance the contextual integration within densely encoded features. Finally, the learned global descriptor is used as the place representation of the dual-modality inputs for VPR. Experimental results demonstrate the state-of-the-art (SOTA) performance on the public datasets