CIS2VR: CNN-based indoor scan to VR environment authoring framework

Hiranya Garbha Kumar, Ninad Khargonkar, Balakrishnan Prabhakaran

Published: 17 Jan 2024, Last Modified: 21 Jun 2025IEEE AIxVR 2025EveryoneCC BY 4.0

Abstract: We present CIS2VR (CNN-based Indoor Scan to VR), an authoring framework designed to transform input RGB- D scans captured by conventional sensors into an interactive VR environment. Existing state-of-the-art 3D instance segmentation algorithms are employed to extract object instances from RGB- D scans. A novel 3D Convolutional Neural Network (3D CNN) architecture is used to learn 3D shape features common to both classiﬁcation and 3D pose estimation problems, enabling rapid shape encoding and pose estimation of objects detected in the scan. The generated embedding vector and predicted pose are then used to retrieve and align a matching 3D CAD (Computer-Aided-Design) model. The aligned models, along with the estimated layout of the scene, are transferred to Unity, a 3D game engine, to create a VR scene. An optional human-in-the- loop system allows users to validate results at various steps of the pipeline, improving the quality of the ﬁnal VR scene. We evaluate and compare our approach to existing semantic reconstruction methods on key metrics. The proposed approach outperforms several existing methods in object alignment, coming close to the state-of-the-art, while speeding up the process an order of magnitude. CIS2VR takes an average of 0.68 seconds for the entire conversion across our test dataset of 312 scenes. The code for the proposed framework will be made publicly available on GitHub.