Abstract: Temporal prediction is an important function in autonomous driving (AD) systems as it forecasts how the environment will change and transform in the next few seconds. Humans have an inherited prediction capability that extrapolates a present scenario to the future. In this paper, we present a novel approach to look further into the future using a standard semantic segmentation representation and time series networks of varying architectures. An important property of our approach is its flexibility to predict an arbitrary time horizon into the future. We perform prediction in the semantic segmentation domain where inputs are semantic segmentation masks. We present extensive results and discussion on different data dimensionalities that can prove beneficial for prediction on longer time horizons (up to \(2\,\textrm{s}\)). We also show results of our approach on two widely employed datasets in AD research, i.e., Cityscapes and BDD100K. We report two types of mIoUs as we have investigated with self generated ground truth labels (mIoU\(^{seg}\)) for both of our dataset and actual ground truth labels (mIoU\(^\textrm{gt}\)) for a specific split of the Cityscapes dataset. Our method achieves \(57.12\%\) and \(83.95\%\) mIoU\(^{seg}\), respectively, on the validation split of BDD100K and Cityscapes, for short-term time horizon predictions (up to \(0.2\,\textrm{s}\) and \(0.06\,\textrm{s}\)), outperforming the current state of the art on Cityscapes by \(13.71\%\) absolute. For long-term predictions (up to \(2\,\textrm{s}\) and \(0.6\,\textrm{s}\)), we achieve \(37.96\%\) and \(63.65\%\) mIoU\(^{seg}\), respectively, for BDD100K and Cityscapes. Specifically on the validation split of Cityscapes with perfect ground truth annotations, we achieve \(67.55\%\) and \(63.60\%\) mIoU\(^\textrm{gt}\), outperforming current state of the art by \(1.45\%\) absolute and \(4.2\%\) absolute with time horizon predictions up to \(0.06\,\textrm{s}\) and \(0.18\,\textrm{s}\), respectively.
Loading