Abstract: The advancement of object detection algorithms has catalyzed the development of object-level semantic SLAM. However, due to missed and false detections, object-level semantic SLAM fails to represent the objects within the scene adequately. Therefore, this paper proposes a novel object-level semantic SLAM termed HSS-SLAM. We incorporate human-in-the-loop into our method, establishing an interaction module to facilitate human editing and rectifying semantic information. Additionally, to minimize the manual correction workload, a lightweight and intuitive method for semantic extension is proposed, augmenting the semantic richness of the global map with a few operations. Furthermore, our method adopts superquadrics for object representation, enabling detailed descriptions of various object shapes. This mitigates the limitation of conventional semantic mapping, where objects are difficult to distinguish due to the reliance on a single-shape representation. Subsequently, precise estimation of superquadric parameters and camera poses is achieved through joint optimization. Extensive experiments conducted on TUM RGB-D and Scenes V2 datasets demonstrate that the proposed approach exhibits competitive performance, surpassing current methods in both object representation and camera localization accuracy.
Loading