Lite-SVO: Towards A Lightweight Self-Supervised Semantic Visual Odometry Exploiting Multi-Feature Sharing Architecture

Published: 2024, Last Modified: 12 Nov 2025ICRA 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Not relying on ground-truth data for training, self-supervised semantic visual odometry (SVO) has recently gained considerable attention. Within self-supervised SVO, feature representation inconsistency between semantic/depth and pose tasks presents a significant challenge, as it may disrupt cross-task feature representations and lead to notable performance degradation. Regrettably, existing self-supervised SVO lacks an effective solution to address this obstacle, for either overlooking this issue or exploiting a too heavy architecture. In response to this challenge, we propose a groundbreaking solution within the Single-Stream architecture, known as Lite-SVO, which is a lightweight yet efficient multi-feature sharing architecture. Lite-SVO is designed to bolster self-supervised SVO, facilitating its adoption on edge devices without compromising accuracy and performance. The crucial innovation lies in the multi-feature sharing architecture, which fuses the semantic and depth maps as pose features, thus significantly reducing the model complexity and boosting the speed in edge devices. Built upon the novel feature sharing framework, Lite-SVO further optimizes the feature sharing representation to improve the performance. Specifically, a cross-feature sharing module alleviates the impact of object boundary in depth estimation, while a multi-feature sharing module focuses on extracting and fusing spatial features to enhance pose estimation. Experimental results demonstrate that our method is at least 84.46% faster than the state-of-the-art Single-Stream approaches, and excitingly, our method’s pose accuracy is about 79.83% higher than theirs.
Loading