Self-supervised multi-frame depth estimation with visual-inertial pose transformer and monocular guidance

Xiang Wang, Haonan Luo, Zihang Wang, Jin Zheng, Xiao Bai

Published: 2024, Last Modified: 11 Nov 2024Inf. Fusion 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•A new self-supervised multi-frame depth network incorporating IMU modality.•A visual-inertial fusion Transformer to improve pose estimation involved in multi-frame depth.•A monocular guided excitation module bridges monocular and multi-frame depth branches.•Experiments demonstrate improved depth accuracy against previous approaches.