Abstract: Keyframe insertion is critical for the performance and robustness of SLAM systems. However, traditional heuristic-based methods often lead to suboptimal keyframe selection, compromising the accuracy of localization and mapping. To address this, we propose KINND, a lightweight neural network-based framework for real-time keyframe insertion. The framework introduces a novel foundational paradigm for learning-based keyframe insertion, encompassing the model architecture and training methodology. A neural network model is designed using a hierarchical weighted self-attention mechanism to encode real-time SLAM state information into high-dimensional representations, producing keyframe insertion decisions. To overcome the absence of ground truth for keyframe insertion, a composite loss function is developed by integrating pose error and system state information, providing a metric for this task. Additionally, a novel training mode enhances the model's real-time decision-making capabilities. Experimental results on public and private datasets demonstrate that KINND operates in real time without requiring a GPU and, with a single training session on a public dataset, achieves superior generalization performance on other datasets.
Loading