A Geometric Framework for Understanding Memorization in Generative Models

Brendan Leigh Ross; Hamidreza Kamkari; Zhaoyan Liu; Tongzi Wu; George Stein; Gabriel Loaiza-Ganem; Jesse C. Cresswell

A Geometric Framework for Understanding Memorization in Generative Models

Brendan Leigh Ross, Hamidreza Kamkari, Zhaoyan Liu, Tongzi Wu, George Stein, Gabriel Loaiza-Ganem, Jesse C. Cresswell

Published: 28 Jun 2024, Last Modified: 25 Jul 2024NextGenAISafety 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: deep generative models, memorization, geometry, local intrinsic dimension, diffusion, diffusion models

TL;DR: We show that the manifold hypothesis and LID is a useful way to think about memorization in DGMs

Abstract: As deep generative models have progressed, recent work has shown that they are capable of memorizing and reproducing training datapoints when deployed. These findings call into question the usability of generative models, especially in light of the legal and privacy risks brought about by memorization. To better understand this phenomenon, we propose a geometric framework which leverages the manifold hypothesis into a clear language in which to reason about memorization. We propose to analyze memorization in terms of the relationship between the dimensionalities of $(i)$ the ground truth data manifold and $(ii)$ the manifold learned by the model. In preliminary tests on toy examples and Stable Diffusion (Rombach et al., 2022), we show that our theoretical framework accurately describes reality. Furthermore, by analyzing prior work in the context of our geometric framework, we explain and unify assorted observations in the literature and illuminate promising directions for future research on memorization.

Submission Number: 136

Loading