One Picture and a Thousand Words: Generative Language+images Models and How to Train Them

Roberto Zamparelli

Published: 08 Nov 2023, Last Modified: 19 Jan 2024NL4AI 2023: Seventh Workshop on Natural Language for Artificial Intelligence, November 6-7th, 2023, Rome, ItalyEveryoneCC BY 4.0

Abstract: Thanks to independent advances in language and image generation we could soon be in the position to have systems that communicate with humans by combining language and images in their output, a skill that humans do not possess (we receive, but do not produce images at high speed). The paper explores some of the implications of this idea: which kinds of data sets need to be developed to train such systems, in which cases language and images could be most usefully integrated and which issues could arise on the image generation and language+image integration side.