Abstract: Existing RGB-D saliency detection models do not explic- itly encourage RGB and depth to achieve effective multi- modal learning. In this paper, we introduce a novel multi- stage cascaded learning framework via mutual informa- tion minimization to explicitly model the multi-modal in- formation between RGB image and depth data. Specifi- cally, we first map the feature of each mode to a lower dimensional feature vector, and adopt mutual information minimization as a regularizer to reduce the redundancy be- tween appearance features from RGB and geometric fea- tures from depth. We then perform multi-stage cascaded learning to impose the mutual information minimization constraint at every stage of the network. Extensive exper- iments on benchmark RGB-D saliency datasets illustrate the effectiveness of our framework. Further, to prosper the development of this field, we contribute the largest (7× larger than NJU2K) COME15K dataset, which contains 15,625 image pairs with high quality polygon-/scribble- /object-/instance-/rank-level annotations. Based on these rich labels, we additionally construct four new benchmarks with strong baselines and observe some interesting phenom- ena, which can motivate future model design. Source code and dataset are available at https://github.com/ JingZhang617/cascaded_rgbd_sod.
0 Replies
Loading