Representing part-whole hierarchy with coordinated synchrony in neural networks

15 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: hierarchical representation, part whole hierarchy, neuronal coherence, synchrony code, metastability, spiking neural network, temporal binding, nested oscillation, object-centric representation, non-equilibrium states, neocortex, top-down modulation, hybrid neural network, unsupervised learning, self-supervised learning, representation learning, segmentation
TL;DR: Emergent nested synchrony structure in a cortical-like network model (ANN+SNN) represents the part-whole hierarchy of visual scenes.
Abstract: Human vision flexibly extracts part-whole hierarchy from visual scenes. However, how can a neural network with a fixed architecture parse an image into a part-whole hierarchy that potentially has a different structure for each image is a difficult question. This paper presents a new framework to represent the part-whole hierarchy by the hierarchical neuronal synchrony: (1) Neurons are dynamically synchronized into neuronal groups (of different timescales) to temporarily represent each object (wholes, parts, sub-parts, etc.) as the nodes of the parse tree. (2) The coordinated temporal relationship among neuronal groups represents the structure (edges) of the parse tree. Further, we developed a simple two-level hybrid model inspired by the visual cortical circuit, the Composer, which is able to dynamically achieve the emergent coordinated synchronous states given an image. The synchrony states are gradually created by the iterative top-down prediction / bottom-up integration between levels and inside each level. For evaluation, four synthetic datasets and three quantitative metrics are invented. The quantitative and qualitative results show that the Composer is able to parse a range of scenes of different complexities through dynamically formed neuronal synchrony. It is promising that the systematic framework proposed in this paper, from representation and implementation to evaluation, sheds light on developing human-like vision in neural network models.
Supplementary Material: zip
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 297
Loading