Keywords: Interpretability, non-parametric model, generative visual model, n-gram, representation learning
TL;DR: The paper presents a training-free, transparent image generation model that leverages simple natural image principles to produce high-quality samples and reveal how generative models generalize.
Abstract: Scaling and architectural advances have produced strikingly photorealistic image generative model, yet their mechanisms remain opaque. Rather than advancing scaling, we strip away complicated engineering trick and propose a simple, non-parametric conditional generative model. Our design is grounded in three principles of natural images—(i) spatial non-stationarity, (ii) low-level regularities, and (iii) high-level semantics—and defines each pixel’s distribution from its local context window. Despite its minimal architecture and no training, the model produces high-fidelity MNIST samples and visually compelling CIFAR-10 images. This combination of simplicity and strong empirical performance points toward a minimal theory of natural-image structure. The model’s white-box nature also allows us to have a mechanistic understanding how the model generalize and generate diverse images. We study it by analyzing how is each pixel generated by tracing every generated pixel back to its source images. These analysis reveal a simple, compositional procedure for "part-whole generalization." These findings suggest a hypothesis for how large neural network generative model learn to generalize.
Primary Area: interpretability and explainable AI
Submission Number: 8114
Loading