Keywords: Dynamics modeling, partial observations, multi-agent interactions, predictive models
TL;DR: We present a method which learns to integrate temporal information and ambiguous visual information in the context of interacting agents.
Abstract: We present a method which learns to integrate temporal information, from a learned dynamics model, with ambiguous visual information, from a learned vision model, in the context of interacting agents. Our method is based on a graph-structured variational recurrent neural network, which is trained end-to-end to infer the current state of the (partially observed) world, as well as to forecast future states. We show that our method outperforms various baselines on two sports datasets, one based on real basketball trajectories, and one generated by a soccer game engine.