Abstract: Scene labeling (SL) in images is a significant part of Visual Internet of Things (VIoT). Most of existing approaches for SL employ fully supervised methods requiring massive pixel-wise annotations which are costly and time-consuming to obtain. To further deduce the required amount of manually labeled data, we propose a semi-supervised SL paradigm based on the joint optimization of deep representation and scene clustering. In order to learn the deep representation with semantic-friendly distribution, we design a novel constrained clustering which is composed of two steps: (1) over-clustering deep features into raw clusters with high self-consistence; (2) introducing sparse annotations as semantic constrains to merge raw clusters into scene clusters. Experimental results show that the proposed approach has achieved satisfying performance on SIFT Flow and Stanford Background benchmarks by leveraging very few annotations (0.1\% or less).
0 Replies
Loading