Real-Time Semantic Segmentation in Natural Environments with SAM-assisted Sim-to-Real Domain Transfer

Han Wang, Ruben Mascaro, Margarita Chli, Lucas Teixeira

Published: 01 Jan 2024, Last Modified: 15 Jan 2025IROS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Semantic segmentation plays a pivotal role in many robotic applications requiring high-level scene understanding, such as smart farming, where the precise identification of trees or plants can aid navigation and crop monitoring tasks. While deep-learning-based semantic segmentation approaches have reached outstanding performance in recent years, they demand large amounts of labeled data for training. Inspired by modern Unsupervised Domain Adaptation (UDA) techniques, in this paper, we introduce a two-step training pipeline specifically tailored to challenging natural scenes, where the availability of annotated data is often quite limited. Our strategy involves the initial training of a powerful domain adaptive architecture, followed by a refinement stage, where segmentation masks predicted by the Segment Anything Model (SAM) are used to improve the accuracy of the predictions on the target dataset. These refined predictions serve as pseudo-labels to supervise the training of a final distilled architecture for real-time deployment. Extensive experiments conducted in two real-world scenes demonstrate the effectiveness of the proposed method. Specifically, we show that our pipeline enables the training of a MobileNetV3 that achieves significant mIoU gains of 3.60% and 11.40% on our two datasets compared to the DAFormer while only demanding 1/15 of the latter’s inference time. Code and datasets are available at https://github.com/VIS4ROB-lab/nature_uda_rt_segmentation.