When Slots Compete: Slot Merging in Object-Centric Learning

Published: 02 Jun 2026, Last Modified: 21 Jun 2026Greeks in AI 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Object Centric Learning, Slot Attention, Adaptive Refinement, Scene Decomposition
Domains: Vision and Learning
TL;DR: We introduce a lightweight slot merging operation with a fixed policy that merges overlapping slots based on Soft-IoU to reduce slot competition and improve object-centric representations.
External Link: https://arxiv.org/pdf/2603.11246
Abstract: Slot-based object-centric learning represents an image as a set of latent slots with a decoder that combines them into an image or features. The decoder specifies how slots are combined into an output, but the slot set is typically fixed: the number of slots is chosen upfront and slots are only refined. This can lead to multiple slots competing for overlapping regions of the same entity rather than focusing on distinct regions. We introduce slot merging: a drop-in, lightweight operation on the slot set that merges overlapping slots during training. We quantify overlap with a Soft-IoU score between slot-attention maps and combine selected pairs via a barycentric update that preserves gradient flow. Merging follows a fixed policy, with the decision threshold inferred from overlap statistics, requiring no additional learnable modules. Integrated into the established feature-reconstruction pipeline of DINOSAUR, the proposed method improves object factorization and mask quality, surpassing other adaptive methods in object discovery and segmentation benchmarks. Keyword: Vision and Learning
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 1
Loading