Abstract: This paper presents an automated system designed to streamline the creation of interactive real estate video tours. These virtual walkthrough tours allow potential buyers to explore properties by skipping or focusing on rooms of interest, enhancing the decision-making process. However, the current manual method for producing these tours is costly and time-consuming. We propose a system that automates key aspects of the walkthrough video creation process, including the identification of room transitions and room label extraction. Our proposed system utilizes transformer-based video segmentation, addressing challenges such as the lack of clear visual boundaries between open-plan rooms and the difficulty of classifying rooms in unfurnished properties. We demonstrate in an ablation study that the combined usage of ResNet frame embeddings, and a transformer-based temporal postprocessing that uses a separately trained doorway detection network as extra input yields the best results for roo
Loading