SEE&TELL: Controllable Narrative Generation from Images

Stephanie M. Lukin; Sungmin Eum

SEE&TELL: Controllable Narrative Generation from Images

Stephanie M. Lukin, Sungmin Eum

21 Nov 2022 (modified: 05 May 2023)creativeAIReaders: Everyone

Keywords: visual storytelling, computer vision, natural language generation, large language models

TL;DR: We present a visual storytelling framework inspired by narrative theories, and evaluate our generated stories for visual novelty and reader willingness to read more.

Abstract: We propose a visual storytelling framework with a distinction between what is present and observable in the visual storyworld, and what story is ultimately told. We implement a model that tells a story from an image using three affordances: 1) a fixed set of visual properties in an image that constitute a holistic representation its contents, 2) a variable stage direction that establishes the story setting, and 3) incremental questions about character goals. The generated narrative plans are then realized as expressive texts using few-shot learning. Following this approach, we generated 64 visual stories and measured the preservation, loss, and gain of visual information throughout the pipeline, and the willingness of a reader to take action to read more. We report different proportions of visual information preserved and lost depending upon the phase of the pipeline and the stage direction's apparent relatedness to the image, and report 83% of stories were found to be interesting.

Submission Type: archival

Presentation Type: onsite

Presenter: Stephanie Lukin and Sungmin Eum

0 Replies

Loading