Abstract: We propose an action estimation pipeline based on the simultaneous recognition of the hands and the objects in the scene from an egocentric perspective. From the latest approaches, we have come to the conclusion that the hands are a key element from this point of view. An action consists of the interactions of the hands with the different objects in the scene therefore, the 2D positions of hands and objects are used to compute the object which is more likely to be used. The architecture used for achieving this goal is YOLO, as its prediction speed allows us to predict the actions fluently with good accuracy on the detected objects and hands. After reviewing the available datasets and generators for hand and object detection, different experiments have been conducted. The best results determined by PascalVOC metric have been used in the proposed pipeline.
0 Replies
Loading