Resolving Loop Closure Confusion in Repetitive Environments for Visual SLAM through AI Foundation Models Assistance

Hongzhou Li, Sijie Yu, Shengkai Zhang, Guang Tan

Published: 01 Jan 2024, Last Modified: 01 Apr 2025ICRA 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In visual SLAM (VSLAM) systems, loop closure plays a crucial role in reducing accumulated errors. However, VSLAM systems relying on low-level visual features often suffer from the problem of perceptual confusion in repetitive environments, where scenes in different locations are incorrectly identified as the same. Existing work has attempted to introduce object-level features or artificial landmarks. The former approach struggles to distinguish visually similar but different objects, while the latter is both time-consuming and labor-intensive. This paper introduces a novel loop closure detection method that leverages pretrained AI foundation models to extract rich semantic information about specific types of objects (e.g., door numbers), referred to as semantic anchors, that help to distinguish similar scenes better. In settings such as office buildings, hotels, and warehouses, this approach helps to improve the robustness of loop closure detection. We validate the effectiveness of our method through experiments conducted in both simulated and real-world environments.