B$$^{3}$$CT: Three-Branch Learning with Unlabeled Target Signals for Domain-Robust Semantic Segmentation

Chen Liang, Xin Zhao, Jian Jia, Junyan Wang, Lijun Cao, Jianguo Zhang, Weihua Chen

Published: 01 Mar 2026, Last Modified: 24 Feb 2026International Journal of Computer VisionEveryoneRevisionsCC BY-SA 4.0

Abstract: Semantic segmentation models often suffer from significant performance degradation when applied to unseen domains due to domain shifts. To address this challenge, we explore how to leverage unlabeled target-domain images during training to improve model robustness and generalization. Existing approaches primarily focus on achieving global alignment between source and target distributions, yet pay little attention to where and when such alignment should occur within the network. Through empirical observations, we find that different semantic contents are naturally aligned at different stages, and that alignment should be progressively enhanced as the quality of pseudo labels improves over training. Based on these insights, we propose a Three-Branch Coordinated Training (B$^{3}$CT) framework. In addition to conventional source and target branches, B$^{3}$CT introduces a dedicated alignment branch, where a hybrid-attention mechanism is used to guide feature-level consistency. To dynamically control the alignment strength, we design an Adaptive Alignment Controller (AAC) and a coordinate weighting strategy that modulates the alignment intensity according to the training progress. Extensive experiments on GTAV$\rightarrow $Cityscapes and SYNTHIA$\rightarrow $Cityscapes benchmarks demonstrate that our method achieves competitive performance and exhibits strong robustness to domain shifts.

External IDs:doi:10.1007/s11263-026-02782-7