TerraFusion: Semi-Supervised Vision-Proprioception Fusion for Robust Terrain Classification

Hongze Li, Rui Xie, Haotian Zhou, Jianghuan Xu, Jun Zhou, Wugen Zhou, Huijing Zhao, Hongbin Zha

Published: 01 Nov 2025, Last Modified: 23 Oct 2025IEEE Robotics and Automation LettersEveryoneRevisionsCC BY-SA 4.0

Abstract: Terrain classification is essential for traversability estimation and planning of autonomous ground vehicles (AGVs) in complex environments. Most existing approaches utilize fully supervised learning to classify terrains based on either exteroceptive or proprioceptive sensor modalities. However, vision-based methods suffer reduced robustness under varying lighting conditions, while proprioception-based methods exhibit decreased accuracy due to the sparse features inherent in proprioceptive signals. To address these limitations, we propose TerraFusion, a semi-supervised vision-proprioception fusion framework for terrain classification. The central concept is to exploit the representational richness of pretrained visual features as a supervisory signal to guide the learning of proprioceptive representations through contrastive learning. We introduce two training strategies that enable effective cross-modal alignment through contrastive learning. And final predictions are obtained using Bayesian decision principle based on the confidence of each modality, enhancing robust classification under diverse conditions. Experimental results on two datasets demonstrate that under semi-supervised settings, the proposed method consistently outperforms baseline models in terms of average performance across various environmental conditions. Meanwhile, the proposed method can be deployed on edge devices for real-time terrain classification, demonstrating its application value.

External IDs:doi:10.1109/lra.2025.3615538