KAN-ViT: A Visual-Tactile Fusion Learning Method for Grasping States Classification and Safe Force Inference
Keywords: Deformable Objects, Optical Flow, Grasping States Classification, Safe Force Inference
Abstract: To ensure stable and safe grasping during fine
operations, it is required that the robot can accurately determine
the grasping states and infer safe operating force, especially when
grasping deformable objects. However, when grasping soft and
light objects, the tactile signal feedback is very weak, making it
difficult to classify the grasping states. Aiming to solve this problem, we introduce the optical flow information into tactile image
feature processing to capture subtle dynamic variations in tactile
data. A multimodal optical flow dataset, OFB-6 is constructed
to support feature-level fusion of visual and tactile modalities.
Further, an improved end-to-end transformer architecture is
proposed by integrating visual and tactile data for grasping states
classification and safe force prediction. Specifically, the k-NN
attention mechanism is employed to enhance grasp states classification accuracy and then replacing multilayer perceptions(MLPs)
with KAN network for reducing computational complexity and
improving time efficiency.
Submission Number: 6
Loading