Distilling Part-whole Hierarchical Knowledge from a Huge Pretrained Class Agnostic Segmentation Framework
Keywords: Deep learning, Knowledge distillation, Part-whole hierarchy, GLOM, Agglomerator
Abstract: We propose a novel approach for distilling visual knowledge from a large-scale pre-trained segmentation model, namely, the Segment Anything Model (SAM). Our goal is to pre-train the Agglomerator, a recently introduced column-style network architecture inspired by the organization of neurons in the Neocortex, to learn part-whole hierarchies in images. Despite its biological plausibility, we find that
the original pre-training strategy of the Agglomerator, using supervised contrastive loss, fails to work effectively with natural images. To address this, we introduce a new pre-training strategy that aims to instill the model with prior knowledge of the compositional nature of our world. Our approach involves dividing the input image into patches and using the center point of each patch to generate segmentation masks through SAM. SAM produces three results per point to handle ambiguity at the whole, part, and sub-part levels. We then train a simple encoder to utilize the intermediate feature maps of the Agglomerator and reconstruct the embeddings of the masks. This forces the net-
work’s intermediate features to learn objects and their constituent parts. By employing our pre-training strategy, we significantly enhance the classification performance on Imagenette, achieving an accuracy improvement from 58.6% to 91.2% without relying on any augmentation. Remarkably, we achieve this with a minimal parameter count of only 3.2 million, which is approximately 54 times smaller than the originally proposed Agglomerator. These results demonstrate both exceptional data and resource efficiency. Our code is available at: https://github.com/AhmedMostafaSoliman/distill-part-whole
Submission Number: 23
Loading