CLIPDraw: Exploring Text-to-Drawing Synthesis through Language-Image Encoders

Kevin Frans; Lisa Soros; Olaf Witkowski

CLIPDraw: Exploring Text-to-Drawing Synthesis through Language-Image Encoders

Kevin Frans, Lisa Soros, Olaf Witkowski

Published: 31 Oct 2022, Last Modified: 06 Apr 2025NeurIPS 2022 AcceptReaders: Everyone

Keywords: image synthesis, clip, computer vision, language to text, creativity, art

Abstract: CLIPDraw is an algorithm that synthesizes novel drawings from natural language input. It does not require any additional training; rather, a pre-trained CLIP language-image encoder is used as a metric for maximizing similarity between the given description and a generated drawing. Crucially, CLIPDraw operates over vector strokes rather than pixel images, which biases drawings towards simpler human-recognizable shapes. Results compare CLIPDraw with other synthesis-through-optimization methods, as well as highlight various interesting behaviors of CLIPDraw.

TL;DR: CLIPDraw is an algorithm that synthesizes novel drawings from natural language input.

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/clipdraw-exploring-text-to-drawing-synthesis/code)

6 Replies

Loading