Abstract: In this paper, we propose TUPPer-Map, a metric-semantic mapping framework based on the unified panoptic segmentation and temporal data association. In contrast to the previous mapping method, our framework integrates the data association stage into the holistic pixel-level segmentation stage in an end-to-end fashion, taking advantage of both intra-frame and inter-frame spatial and temporal knowledge. Firstly, we unify two-branch instance segmentation network and semantic segmentation network into a single network by sharing the backbone net, maximizing the 2D panoptic segmentation performance. Next, we leverage geometric segmentation to refine the segments predicted via deep learning. Then, we design a novel deep learning based data association module to track the object instances across different frames. Optical flow of consecutive frames and alignment of ROI (Region of Interest) candidates are learned to predict the frame-consistent instance label. At last, 2D semantics are integrated into 3D volume by TSDF raycasting to build the final map. We evaluated the performance of our framework extensively over the SceneNN, ScanNet v2 and Cityscapes-VPS datasets. Our experimental results demonstrate the superiority of TUPPer-Map over existing semantic mapping methods. Overall, our work illustrates that using learning based data association strategy can enable a more unified perception network for 3D mapping.
Loading