Learning Object-Centric Representation via Reverse Hierarchy Guidance

Junhong Zou; Xiangyu Zhu; Zhaoxiang Zhang; Zhen Lei

Learning Object-Centric Representation via Reverse Hierarchy Guidance

Junhong Zou, Xiangyu Zhu, Zhaoxiang Zhang, Zhen Lei

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Computer Vision; Reverse Hierarchy Theory; Object-Centric Learning

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Abstract: Object-Centric Learning (OCL) seeks to enable Neural Networks to identify individual objects in a visual scene in an unsupervised manner, which is a meaningful task because the ability to recognize objects and understand their relationships is the foundation of interpretable visual comprehension and reasoning. Due to humans' strong ability to split visual scenes into object sets, incorporating the mechanism of human visual perception into model architecture is a potential way to enhance object representation. According to Reverse Hierarchy Theory (RHT), the human visual system comprises two reverse processes: a bottom-up process rapidly extracting the gist of scenes and a top-down process integrating detailed information into consciousness. Inspired by RHT, We propose Reverse Hierarchy Guided Network (RHGNet) that enhances the models' object-centric representations through an extra top-down pathway as described in RHT. This pathway allows for more decisive semantic information to be included in extracted low-level features, as well as helps search for optimal solutions to distinguish objects from low-level features. We demonstrate with experiments that the model benefits from our method and achieves a stronger ability to differentiate objects, especially the easily ignored small and occluded ones, than current models following a pure bottom-up fashion.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: pdf

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7275

Loading