Abstract: Predicting the future behavior of road users is one of the most challenging and important problems
in autonomous driving. Applying deep learning to this problem requires fusing heterogeneous
world state in the form of rich perception signals and map information, and inferring highly multimodal distributions over possible futures. In this paper, we present MultiPath++, a future prediction
model that achieves state-of-the-art performance on popular benchmarks. MultiPath++ improves the
MultiPath architecture [45] by revisiting many design choices. The first key design difference is a
departure from dense image-based encoding of the input world state in favor of a sparse encoding of
heterogeneous scene elements: MultiPath++ consumes compact and efficient polylines to describe
road features, and raw agent state information directly (e.g., position, velocity, acceleration). We
propose a context-aware fusion of these elements and develop a reusable multi-context gating fusion
component. Second, we reconsider the choice of pre-defined, static anchors, and develop a way
to learn latent anchor embeddings end-to-end in the model. Lastly, we explore ensembling and
output aggregation techniques—common in other ML domains—and find effective variants for
our probabilistic multimodal output representation. We perform an extensive ablation on these
design choices, and show that our proposed model achieves state-of-the-art performance on the
Argoverse Motion Forecasting Competition [12] and the Waymo Open Dataset Motion Prediction
Challenge [18].
0 Replies
Loading