MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments

Spyros Gidaris; Andrei Bursuc; Oriane Siméoni; Antonín Vobecký; Nikos Komodakis; Matthieu Cord; Patrick Perez

MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments

Spyros Gidaris, Andrei Bursuc, Oriane Siméoni, Antonín Vobecký, Nikos Komodakis, Matthieu Cord, Patrick Perez

Published: 02 Feb 2024, Last Modified: 09 May 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Event Certifications: iclr.cc/ICLR/2025/Journal_Track

Abstract: Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks for very large fully-annotated datasets. Different classes of self-supervised learning offer representations with either good contextual reasoning properties, e.g., using masked image modeling strategies, or invariance to image perturbations, e.g., with contrastive methods. In this work, we propose a single-stage and standalone method, MOCA, which unifies both desired properties using novel mask-and-predict objectives defined with high-level features (instead of pixel-level details). Moreover, we show how to effectively employ both learning paradigms in a synergistic and computation-efficient way. Doing so, we achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols with a training that is at least 3 times faster than prior methods. We provide the implementation code at https://github.com/valeoai/MOCA.

Submission Length: Regular submission (no more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=t1Na2oVyU4&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DTMLR%2FAuthors%23your-submissions)

Changes Since Last Submission: + Small re-structuring of the approach section and a bit longer discussion on the codebook construction and the dynamic prototype generation modules + Additional results with ablations on the Cityscapes semantic segmentation task (Sec. 3.2) and using farthest point sampling for the codebook updates (Sec A.2) + Moving the COCO results from appendix to main paper + Additional implementation details in the appendix + Adding acknowledgements.

Code: https://github.com/valeoai/MOCA

Assigned Action Editor: ~Joao_Carreira1

License: Creative Commons Attribution 4.0 International (CC BY 4.0)

Submission Number: 1583

Loading