Memory-Efficient Real Time Many-Class 3D Metric-Semantic Mapping

Vallabh Nadgir, Joao Marcos Correia Marques, Kris Hauser

Published: 19 Oct 2025, Last Modified: 27 Mar 20262025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)EveryoneCC BY 4.0

Abstract: Metric-semantic 3D mapping is the process of creating class-labeled 3D maps by fusing the information from images captured by a moving camera. The memory usage required by standard solutions grows linearly with the number of semantic classes being considered, which can pose a bottleneck in large and many-class scenes. This paper proposes two novel methods for compressing the memory used by semantic fusion: calibrated top-k histogram and encoded fusion. The first method maintains, for each voxel, only the counts of the k most likely classes, while the second method uses a neural network to encode all-class probability vectors into a k-dimensional latent space in which per-voxel fusion is performed. The fused result is then decoded, at query time, using another neural network. Experiments show that both methods preserve map accuracy and calibration even at low values of k, and per-voxel memory usage is linear in k. The proposed methods can achieve real-time semantic fusion with 150 classes on commodity GPUs in building-scale scenes where prior approaches run out of memory.