Abstract: Visuotactile sensors have recently attracted much attention in robot communities due to the benefit of high spatial resolution sensing. However, force/torque estimation by visuotactile sensors remains a challenging problem. In this paper, we propose a learning-based six-axis force/torque estimation network using GelStereo visuotactile sensor, which can provide two-dimensional (2D) and three-dimensional (3D) displacements of markers embedded in the sensor surface. The convolutional neural networks are employed to extract multi-modal tactile deformation features; and a novel contact positional encoding method is proposed to eliminate the influence of translation invariance in convolutional operators. The well-trained model achieves the best RMSE of 0.290 N in force and 0.0084 Nm in torque. Furthermore, the proposed force/torque estimation network is integrated with a force-feedback policy for adaptive grasping tasks. The experimental results demonstrate the effectiveness of the proposed method and its potential application in robotic grasping and manipulation tasks.
Loading