Seeing Through the Facade: Understanding the Realism, Expressivity, and Limitations of Diffusion Models
Keywords: Diffusion Model, Image Classification, Deepfakes, Computer Vision, Machine Learning, ICML
TL;DR: We offer insights into the limitations of existing diffusion models and potential areas for improvement in photorealistic image generation.
Abstract: While text-to-image generation models such as DALLE-2 and Stable Diffusion 2.0 have captured the public psyche with the ability to create photorealistic images, just how "fake" are their outputs? To better understand this question, we present a three-prong process for extracting insights from diffusion models. First, we show strong results in classifying real vs. fake images by using transfer learning with a nearly decade-old model, setting an initial benchmark of realism not yet achieved. After visualizing the classifier's inference decisions, we conclude that concrete, singular subject objects -- like buildings and hands -- helped distinguish real from fake images. However, we found no consensus on which features were distinct to each of DALLE-2 and Stable Diffusion. Finally, after dissecting the prompts used to generate fake images, we found that prompts that failed to trick our classifier contained similar types of nouns while prompts that succeeded in this task differed for each model. We believe our work can serve as the first step in an iterative process that continuously establishes increasingly difficult benchmarks of realism for diffusion models to overcome. The code for our project is open source: https://github.com/cpondoc/diffusion-model-analysis.
Submission Number: 49
Loading