Abstract: For humans and robots to collaborate more in the real world, robots need to understand human intentions from the different manner of their behaviors.
In our study, we focus on the meaning of adverbs which describe human motions.
We propose a topic model, Hierarchical Dirichlet Process-Spectral Mixture Latent Dirichlet Allocation,
which concurrently learns the relationship between those human motions and those adverbs by capturing the frequency kernels that represent motion characteristics and the shared topics of adverbs that depict such motions.
We trained the model on datasets we made from movies about "walking" and "dancing", and found that our model outperforms representative neural network models in terms of perplexity score.
We also demonstrate our model’s ability to determine the adverbs for a given motion and confirmed that the model predicts more appropriate adverbs.
Paper Type: long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Languages Studied: Japanese
0 Replies
Loading