Abstract: The 3D reconstruction of a real-world scene, usually represented as a textured mesh, supports the fusion of various information and can solve complex problems. For example, the fusion of the 3D textured mesh with a relatively small set of semantically annotated input images can generate a supplementary semantic mesh. The two meshes can be used, based on a set of consecutive camera positions, to generate novel RGB, depth, or semantic images. However, these meshes are not able to represent the dynamic objects, as these objects tend to vanish in the 3D textured mesh construction process. The main goal of this work is to provide a solution to generate training sequences including dynamic entities. Therefore, for a given camera pose and time instance, a new RGB, depth or semantically annotated image can be generated using a mesh instance that inserts the dynamic entities corresponding to the given timestamp. The proposed solution utilizes Blender, an open-source software tool, to place and animate the photorealistic mesh models of the different dynamic entities in the 3D mesh representation of the real-world scene. We use this tool to manage environment occlusions and shadows, and to mimic the properties of the scene. After the photorealistic rendering, we obtain a set of images of the dynamic objects and their shadows. We propose a technique to transfer the generated dynamic information into RGB, depth and semantic images. The method focuses on entities such as moving cars and pedestrians, but any dynamic entity is suitable. We used a subset of the UAVid dataset to test the practical viability of the solution for supervised semantic segmentation training. Experimental results show that using the enriched images, as opposed to the initial images, increases the performance on the semantic segmentation task by 10.77% mIoU.
Loading