Abstract: To reach human performance on complex tasks, a key ability for artificial
systems is to understand physical interactions between objects, and predict
future outcomes of a situation. This ability, often referred to as
intuitive
physics
, has recently received attention and several methods were proposed to
learn these physical rules from video sequences. Yet, most these methods are
restricted to the case where no occlusions occur, narrowing the potential areas
of application. The main contribution of this paper is a method combining
a predictor of object dynamics and a neural renderer efficiently predicting
future trajectories and explicitly modelling partial and full occlusions among
objects. We present a training procedure enabling learning intuitive physics
directly from the input videos containing segmentation masks of objects and
their depth. Our results show that our model learns object dynamics despite
significant inter-object occlusions, and realistically predicts segmentation
masks up to 30 frames in the future. We study model performance for
increasing levels of occlusions, and compare results to previous work on
the tasks of future prediction and object following. We also show results
on predicting motion of objects in real videos and demonstrate significant
improvements over state-of-the-art on the object permanence task in the
intuitive physics benchmark of Riochet et al. (2018).
Original Pdf: pdf
9 Replies
Loading