Common Canvas: Open Diffusion Models Trained on Creative-Commons Images

Aaron Gokaslan; A. Feder Cooper; Jasmine Collins; Landan Seguin; Austin Jacobson; Mihir Patel; Jonathan Frankle; Cory Stephenson; Volodymyr Kuleshov

Common Canvas: Open Diffusion Models Trained on Creative-Commons Images

Aaron Gokaslan, A. Feder Cooper, Jasmine Collins, Landan Seguin, Austin Jacobson, Mihir Patel, Jonathan Frankle, Cory Stephenson, Volodymyr Kuleshov

Published: 01 Jan 2024, Last Modified: 23 Mar 2025CVPR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We train a set of open, text-to-image (T2I) diffusion models on a dataset of curated Creative-Commons-licensed (CC) images, which yields models that are competitive with Stable Diffusion 2 (SD2). This task presents two challenges: (1) high-resolution CC images lack the captions necessary to train T2I models; (2) CC images are relatively scarce. To address these challenges, we use an intuitive transfer learning technique to produce a set of high-quality synthetic captions paired with our assembled CC images. We then develop a data- and compute-efficient training recipe that requires as little as 3% of the LAION data (i.e., roughly 70 million examples) needed to train existing SD2 models, but obtains the same quality. These results indicate that we have a sufficient number of CC images (also roughly 70 million) for training high-quality models. Our recipe also implements a variety of optimizations that achieve 2.71 x training speed-ups, enabling rapid model iteration. We leverage this recipe to train several high-quality T2I mod-els, which we dub the CommonCanvas family. Our largest model achieves comparable performance to SD2 on human evaluation, even though we use a synthetically captioned CC-image dataset that is only <3% the size of LAION for training. We release our models, data, and code on GitHub.

Loading