Abstract: Thanks to independent advances in language and image generation we could soon be in the position to
have systems that communicate with humans by combining language and images in their output, a skill
that humans do not possess (we receive, but do not produce images at high speed). The paper explores
some of the implications of this idea: which kinds of data sets need to be developed to train such systems,
in which cases language and images could be most usefully integrated and which issues could arise on
the image generation and language+image integration side.
Loading