Abstract: Indoor RGB-D semantic segmentation is a new and challenging problem. Traditional methods usually apply two-stream convolutional neural networks (CNNs) to represent RGB and depth images respectively, and fuse the two streams on a specific layer. In this paper, we explore several fusion strategies based on this two-stream-CNN framework and point out such a single-layer fusion method cannot exploit the complementary RGB and depth cues well for semantic segmentation. To address this problem, we propose a novel Semantics-guided Multi-level feature fusion approach, which first learns deep feature representation from bottom to up, and then gradually fuses the RGB and depth features from high level to low level under the guidance of the semantic cues. Experimental results on SUN RGB-D dataset demonstrate the advantages of the proposed method over the state of the arts.
External IDs:dblp:conf/icip/LiZCHT17
Loading