everyone
since 13 May 2025">EveryoneRevisionsBibTeXCC BY 4.0
Searching for objects in cluttered environments requires selecting efficient viewpoints and manipulation actions to resolve occlusions and reduce uncertainty about object locations, shapes, and categories. We address the problem of manipulation-enhanced semantic mapping, where a robot efficiently identifies all objects in a cluttered shelf. Although Partially Observable Markov Decision Processes~(POMDPs) are standard for decision-making under uncertainty, representing unstructured interactive worlds remains challenging in this formalism. To overcome this, we introduce a novel POMDP framework that summarizes beliefs using a metric-semantic grid map and leverages neural networks for efficient belief updates, simultaneously reasoning about object geometries, locations, categories, occlusions, and manipulation physics. To ensure efficient exploration via information gain maximization, we propose to use Calibrated Neural-Accelerated Belief Updates (CNABUs), providing confidence-calibrated predictions that generalize to novel scenarios. Our experiments demonstrate improved map completeness and accuracy over existing methods, successfully transferring to real-world cluttered shelves in a zero-shot manner.