Urban 3D semantic modelling using stereo vision

Sunando Sengupta, Eric Greveson, Ali Shahrokni, Philip H. S. Torr

2013 (modified: 21 Feb 2024)ICRA 2013Readers: Everyone

Abstract: In this paper we propose a robust algorithm that generates an efficient and accurate dense 3D reconstruction with associated semantic labellings. Intelligent autonomous systems require accurate 3D reconstructions for applications such as navigation and localisation. Such systems also need to recognise their surroundings in order to identify and interact with objects of interest. Considerable emphasis has been given to generating a good reconstruction but less effort has gone into generating a 3D semantic model. The inputs to our algorithm are street level stereo image pairs acquired from a camera mounted on a moving vehicle. The depth-maps, generated from the stereo pairs across time, are fused into a global 3D volume online in order to accommodate arbitrary long image sequences. The street level images are automatically labelled using a Conditional Random Field (CRF) framework exploiting stereo images, and label estimates are aggregated to annotate the 3D volume. We evaluate our approach on the KITTI odometry dataset and have manually generated ground truth for object class segmentation. Our qualitative evaluation is performed on various sequences of the dataset and we also quantify our results on a representative subset.

0 Replies