Geometrically Consistent Monocular Metric-Semantic 3D Mapping for Indoor Environments with Transparent and Reflecting Objects
Abstract: 3D mapping is crucial for many applications in robotics and related industries. To build dense high-quality point clouds accurate depth estimation or completion is needed. This paper presents the development of a metric-semantic mapping pipeline based on Deep Neural Networks (DNN) which assures geometrical consistency with enhancements for chal-lenging environments with transparent and reflecting objects like glass walls, doors, and mirrors. The suggested approach uses camera ego-motion alongside its sparse visual features to avoid the scale ambiguity issue caused by monocular depth affine-invariant estimations and to able to restore metric consistent depth information. Visual-inertial odometry data is used for camera pose graph optimization with no need to use RGB-D cameras. The proposed pipeline incorporates semantic segmentation and robust filtering to refine point clouds by removing outliers associated with mirrors and glass surfaces. Latency-aware performance and quality evaluation of 3D scene reconstruction were carried out on both a specially prepared dataset that reflects office-like scenes with multiple transparent objects and a public ScanNet dataset. The quantitative and qualitative results show that the proposed solution outperforms other state-of-art DNN-based models and algorithms as well as RGB-D cameras in terms of metric depth geometric consistency, 3D reconstruction accuracy, and the ability to preserve mesh quality in challenging scenarios with transparent and reflective surfaces.
Loading