Exploiting Spatio-Temporal Human-Object Relations Using Graph Neural Networks for Human Action Recognition and 3D Motion Forecasting
Abstract: Human action recognition and motion forecasting is becoming increasingly successful, in particular with utilizing graphs. We aim to transfer this success into the context of industrial Human-Robot Collaboration (HRC), where humans work closely with robots and interact with workpieces in defined workspaces. For this purpose, it is necessary to use all the available information extractable in such a workspace and represent it with a natural structure, such as graphs, that can be used for learning. Since humans are the center of HRC, it is mandatory to construct the graph in a human-centered way and use real-world 3D information as well as object labels to represent their environment. Therefore, we present a novel Graph Neural Network (GNN) architecture which combines, human action recognition and motion forecasting for industrial HRC environments. We evaluate our method with two different and publicly available human action datasets, including one that is a particularly realistic representation of the industrial HRC, and compare the results with baseline methods for classifying the current human action and predicting the human motion. Our experiments show that our combined GNN approach improves the accuracy of action recognition compared to previous work, and significantly on the CoAx dataset by up to 20%. Further, our motion forecasting approach performs better than existing baselines, predicting human trajectories with a Final Displacement Error (FDE) of less than 10cm for a prediction horizon of 1s.
Loading