Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Generating Interpretable Images with Controllable Structure
Scott Reed, Aäron van den Oord, Nal Kalchbrenner, Victor Bapst, Matt Botvinick, Nando de Freitas
Nov 04, 2016 (modified: Jan 18, 2017)ICLR 2017 conference submissionreaders: everyone
Abstract:We demonstrate improved text-to-image synthesis with controllable object locations using an extension of Pixel Convolutional Neural Networks (PixelCNN). In addition to conditioning on text, we show how the model can generate images conditioned on part keypoints and segmentation masks. The character-level text encoder and image generation network are jointly trained end-to-end via maximum likelihood. We establish quantitative baselines in terms of text and structure-conditional pixel log-likelihood for three data sets: Caltech-UCSD Birds (CUB), MPII Human Pose (MHP), and Common Objects in Context (MS-COCO).
TL;DR:Autoregressive text-to-image synthesis with controllable spatial structure.
Keywords:Deep learning, Computer vision, Multi-modal learning, Natural language processing
Enter your feedback below and we'll get back to you as soon as possible.