Abstract: We present CIS2VR (CNN-based Indoor Scan to
VR), an authoring framework designed to transform input RGB-
D scans captured by conventional sensors into an interactive VR
environment. Existing state-of-the-art 3D instance segmentation
algorithms are employed to extract object instances from RGB-
D scans. A novel 3D Convolutional Neural Network (3D CNN)
architecture is used to learn 3D shape features common to
both classification and 3D pose estimation problems, enabling
rapid shape encoding and pose estimation of objects detected
in the scan. The generated embedding vector and predicted
pose are then used to retrieve and align a matching 3D CAD
(Computer-Aided-Design) model. The aligned models, along with
the estimated layout of the scene, are transferred to Unity, a 3D
game engine, to create a VR scene. An optional human-in-the-
loop system allows users to validate results at various steps of the
pipeline, improving the quality of the final VR scene. We evaluate
and compare our approach to existing semantic reconstruction
methods on key metrics. The proposed approach outperforms
several existing methods in object alignment, coming close to
the state-of-the-art, while speeding up the process an order of
magnitude. CIS2VR takes an average of 0.68 seconds for the
entire conversion across our test dataset of 312 scenes. The code
for the proposed framework will be made publicly available on
GitHub.
Loading