Abstract: We introduce Toyteller, an AI-powered storytelling system that allows users to generate a mix of story texts and visuals by directly manipulating character symbols like they are playing with toys. Anthropomorphized motions of character symbols can convey rich and nuanced social interactions between characters; Toyteller leverages these motions as (1) a means for users to steer story text generation and (2) an output format for generated visual accompaniment to user-provided story texts and user-controlled character motions. We enabled motion-steered story text generation and text-steered motion generation by mapping symbol motions and story texts onto a shared semantic vector space so that motion generation models and large language models can use it as a translational layer. We hope this demonstration sheds light on extending the range of modalities supported by generative human-AI co-creation systems.
Loading