KAN-ViT: A Visual-Tactile Fusion Learning Method for Grasping States Classification and Safe Force Inference

Liu Chunfang

KAN-ViT: A Visual-Tactile Fusion Learning Method for Grasping States Classification and Safe Force Inference

Liu Chunfang

Published: 25 Sept 2025, Last Modified: 13 Oct 2025IROS 2025 Workshop Tactile Sensing PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Deformable Objects, Optical Flow, Grasping States Classification, Safe Force Inference

Abstract: To ensure stable and safe grasping during fine operations, it is required that the robot can accurately determine the grasping states and infer safe operating force, especially when grasping deformable objects. However, when grasping soft and light objects, the tactile signal feedback is very weak, making it difficult to classify the grasping states. Aiming to solve this problem, we introduce the optical flow information into tactile image feature processing to capture subtle dynamic variations in tactile data. A multimodal optical flow dataset, OFB-6 is constructed to support feature-level fusion of visual and tactile modalities. Further, an improved end-to-end transformer architecture is proposed by integrating visual and tactile data for grasping states classification and safe force prediction. Specifically, the k-NN attention mechanism is employed to enhance grasp states classification accuracy and then replacing multilayer perceptions(MLPs) with KAN network for reducing computational complexity and improving time efficiency.

Submission Number: 6

Loading