Semantic Mapping in Indoor Embodied AI - A Survey on Advances, Challenges, and Future Directions

TMLR Paper4395 Authors

03 Mar 2025 (modified: 14 Jun 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Intelligent embodied agents (e.g. robots) need to perform complex semantic tasks in unfamiliar environments. Among many skills that the agents need to possess, building and maintaining a semantic map of the environment is most crucial in long-horizon tasks. A semantic map captures information about the environment in a structured way, allowing the agent to reference it for advanced reasoning throughout the task. While existing surveys in embodied AI focus on general advancements or specific tasks like navigation and manipulation, this paper provides a comprehensive review of semantic map-building approaches in embodied AI, specifically for indoor navigation. We categorize these approaches based on their structural representation (spatial grids, topological graphs, dense point-clouds or hybrid maps) and the type of information they encode (implicit features or explicit environmental data). We also explore the strengths and limitations of the map building techniques, highlight current challenges, and propose future research directions. We identify that the field is moving towards developing open-vocabulary, queryable, task-agnostic map representations, while high memory demands and computational inefficiency still remaining to be open challenges. This survey aims to guide current and future researchers in advancing semantic mapping techniques for embodied AI systems.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: **Summary of changes:** 1. We have revised the introduction to connect it to the rest of the survey better and to motivate the key categorizations. 2. To summarize the survey, we have Figure 1 (previous Figure 4) and Table 1 (previous Table 2). They together serve as the summary of the survey. 3. We have added all the reference papers suggested by the AE and all the reviewers in Table 1 and also in appropriate sections. We also add the references in appropriate Tables 3,4,6 (previous Tables 3,4,5). 4. We have moved (previous) Figure 1 to be Figure 2, which serves as the motivation of semantic mapping. 5. We have revised the Evaluation (current section 6) to contain two sub-sections, one for Extrinsic or task-level evaluation (section 6.1) and another for Intrinsic or map-level evaluation (section 6.2). We added Table 7 to summarize the metrics and highlighted the key takeaways in section 6.3 (Summary) and discussed their strengths and limitations. 6. We have added a new section to discuss the 'challenges' in the (current) section 7, where we highlight the key open problems in the field as a whole, across multiple dimensions. 7. To be uniform with section 4, we have added summaries for all sections 5 (5.1, 5.2.1, 5.2.2), 6, 7, 8. 8. We have revised section 4.3 to have 'dense geometric map', under which we separately discuss 'point-cloud map' (section 4.3.1) and 'neural fields' (section 4.3.2). We discuss how prior works in robotics, embodied AI and 3D scene understanding use both these types of maps and added Table 5 to summarize them. 9. We have revised section 2 to connect the background reading to the rest of the survey. 10. We have also revised section 3 to be more structured and relevant to the rest of the survey, and removed the redundant section 3.5 (moved some of the relevant discussions to section 7 - challenges).
Assigned Action Editor: ~Matthew_Walter1
Submission Number: 4395
Loading