Separating Knowledge and Perception with Procedural Data

Adrian Rodriguez-Munoz; Manel Baradad; Phillip Isola; Antonio Torralba

Separating Knowledge and Perception with Procedural Data

Adrian Rodriguez-Munoz, Manel Baradad, Phillip Isola, Antonio Torralba

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: Improve on visual memory approaches by training on procedural data, ensuring compartmentalisation and control over all real data.

Abstract: We train representation models with procedural data only, and apply them on visual similarity, classification, and semantic segmentation tasks without further training by using visual memory---an explicit database of reference image embeddings. Unlike prior work on visual memory, our approach achieves full compartmentalization with respect to all real-world images while retaining strong performance. Compared to a model trained on Places, our procedural model performs within 1\% on NIGHTS visual similarity, outperforms by 8\% and 15\% on CUB200 and Flowers102 fine-grained classification, and is within 10\% on ImageNet-1K classification. It also demonstrates strong zero-shot segmentation, achieving an $R^2$ on COCO within 10\% of the models trained on real data. Finally, we analyze procedural versus real data models, showing that parts of the same object have dissimilar representations in procedural models, resulting in incorrect searches in memory and explaining the remaining performance gap.

Lay Summary: Standard machine learning approaches train vision models with real world images, which makes it difficult to learn (and forget) knowledge and carries privacy and interpretability concerns. In this work, we train vision models with non-realistic images generated with code, and use real world images only through an external memory database. This external memory is easily editable, making the overall models interpretable, flexible, and private. Moreover, despite being trained on non-realistic data the models achieve strong performance. Our work contributes towards making vision models more private and interpretable.

Primary Area: Applications->Computer Vision

Keywords: computer vision, visual memory, procedural data, deep learning

Submission Number: 4198

Loading