Abstract: Human cognition is robust in estimating depth ordering and occluded regions of objects, including amodal instance segmentation (AIS). Object-centric representation learning (OCRL) is an unsupervised approach to obtaining a new representation that mimics human common sense, such as amodal perception. Nevertheless, a significant gap exists between OCRL and human perception, and there is room for improvement for AIS. We aim to empower OCRL with amodal perception by solving self-supervised learning via contrastive learning for OCRL and depth-order estimation. The proposed method calculates the training loss on two masks composed of the object representations extracted from the original image and the transformed image by artificial occluders. Moreover, our method efficiently acquires depth-aware estimation by simultaneously solving the depth-ordering problem and representation learning. We have applied the proposed method to several simulation datasets and confirmed that the accuracy of AIS achieves SOTA performance under weakly supervised learning conditions.
External IDs:dblp:conf/icassp/KanekoS0S25
Loading