CSO: Constraint-guided Space Optimization for Active Scene Mapping

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Simultaneously mapping and exploring a complex unknown scene is an NP-hard problem, which is still challenging with the rapid development of deep learning techniques. We present CSO, a deep reinforcement learning-based framework for efficient active scene mapping. Constraint-guided space optimization is adopted for both state and critic space to reduce the difficulty of finding the global optimal explore path and avoid long-distance round trips while exploring. We first take the frontiers-based entropy as the input constraint with the raw observation into the network, which guides the training start from imitating the local greedy searching. However, the entropy-based optimization can easily get stuck with few local optimal or cause inefficient round trips since the entropy space and the real world do not share the same metric. Inspired by constrained reinforcement learning, we then introduce an action mask-based optimization constraint to align the metric of these two spaces. Exploration optimization in aligned spaces can avoid long-distance round trips more effectively. We evaluate our method with a ground robot in 29 complex indoor scenes with different scales. Our method can perform 19.16\% more exploration efficiency and 3.12\% more exploration completeness on average compared to the state-of-the-art alternatives. We also implement our method in real-world scenes that can efficiently explore an area of 649 $m^2$. The experiment video can be found in the supplementary material.
Primary Subject Area: [Content] Multimodal Fusion
Secondary Subject Area: [Experience] Multimedia Applications
Relevance To Conference: In our work, we present CSO, a deep reinforcement learning-based framework that leverages constraint-guided space optimization for both state and critic spaces to reduce the difficulty of finding the global optimal explore path and avoiding long-distance round trips while exploring. Our research offers a promising solution to automate the scanning process in unknown indoor environments, bridging the gap between multimedia processing and robotic applications. By integrating reinforcement learning techniques, we provide a more effective and scalable solution for active scene mapping with a single robot. This aligns well with the conference's focus on advancing multimodal fusion techniques. Furthermore, the application of our method in real-world scenarios demonstrates its practical utility in multimedia applications, showcasing its potential for advancing multimedia processing techniques. We believe that our manuscript is well-suited for publication in ACM MM 2024 and will make a valuable contribution to the current development of multimedia processing techniques.
Supplementary Material: zip
Submission Number: 2372
Loading