Keywords: Image Generation, Freehand Sketches
Abstract: Recent years have witnessed remarkable progress in generative AI, with natural language emerging as the most common conditioning input. As underlying models grow more powerful, researchers are exploring increasingly diverse conditioning signals -- such as depth maps, edge maps, camera parameters, and reference images -- to give users finer control over generation. Among different modalities, sketches constitute a natural and long-standing form of human communication, enabling rapid expression of visual concepts. Yet algorithms that effectively handle true freehand sketches -- with their inherent abstraction and distortions -- remain largely unexplored.
In this work, we distinguish between edge maps, often regarded as “sketches” in the literature, and genuine freehand sketches. We pursue the challenging goal of balancing photorealism with sketch adherence when generating images from freehand input. A key obstacle is the absence of ground-truth, pixel-aligned images: by their nature, freehand sketches do not have a single correct alignment. To address this, we propose a modulation-based approach that prioritizes semantic interpretation of the sketch over strict adherence to individual edge positions. We further introduce a novel loss that enables training on freehand sketches without requiring ground-truth pixel-aligned images. We show that our method outperforms existing approaches in both semantic alignment with freehand sketch inputs and in the realism and overall quality of the generated images.
Primary Area: generative models
Submission Number: 17146
Loading