Abstract: Assistive robots that operate alongside humans require the ability to understand and replicate human behaviours during a handover. A handover is defined as a joint action between two participants in which a giver hands an object over to the receiver. In this paper, we present a method for learning human-to-human handovers observed from motion capture data. Given the giver and receiver pose from a single timestep, and the object label in the form of a word embedding, our Multitask Variational Autoencoder jointly forecasts their pose as well as the orientation of the object held by the giver at handover. Our method is in large contrast to existing works for human pose forecasting that employ deep autoregressive models requiring a sequence of inputs. Furthermore, our method is novel in that it learns both the human pose and object orientation in a joint manner. Experimental results on the publicly available Handover Orientation and Motion Capture Dataset show that our proposed method outperforms the autoregressive baselines for handover pose forecasting by approximately 20% while being on-par for object orientation prediction with a runtime that is 5x faster. <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">a</sup>
0 Replies
Loading