- Reviewed Version (pdf): https://openreview.net/references/pdf?id=yOo2jYRKC
- Keywords: Layout Estimation, Deep Stereo, Computer Vision
- Abstract: Accurate layout estimation is crucial for planning and navigation, for robotics applications such as self driving. In this paper, we introduce stereo bird's eye view network SBEVNet, a novel supervised end-to-end framework for estimation of bird's eye view layout from a pair of stereo images. Although our network reuses the building blocks from the state-of-the-art deep learning networks for disparity estimation, we show that accurate depth estimation is neither sufficient nor necessary. Instead, the learning of a good internal bird's eye view feature representation is essential for layout estimation. Specifically, we first generate a disparity feature volume using the features of the stereo images and then project it to the bird's eye view coordinates. This gives us coarse grained scene structural information. We also apply inverse perspective mapping (IPM) to map the input images and their features to the bird's eye view. This gives us fine grained texture information. The concatenated IPM features with the projected feature volume creates a rich bird's eye view representation which is capable of spatial reasoning. We use this representation to estimate the BEV semantic map. Additionally, we show that using the IPM features as a supervisory signal for stereo features can give an improvement in performance. We demonstrate our approach on two datasets: KITTI dataset and synthetically generated dataset using the CARLA simulator. For both of the datasets, we establish state-of-the-art performance beyond other baselines.
- One-sentence Summary: A novel end-to-end method for stereo layout estimation.
- Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
- Supplementary Material: zip