Team16-DeepDoodle : An Agentic AI Framework for Visual Story Generation from Text

Team16-DeepDoodle : An Agentic AI Framework for Visual Story Generation from Text

Indian Institute of Science Summer 2025 DA225o Submission9 Authors

07 Jun 2025 (modified: 25 Jun 2025)Indian Institute of Science Summer 2025 DA225o SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Agentic AI, Multimodal Generation, Stable Diffusion, LangChain, Large Language Models (LLMs), Generative AI, Prompt Engineering, Image Consistency

TL;DR: An agentic AI framework that transforms text based stories into visually consistent comic panels using LLMs and diffusion models, enabling immersive, multilingual, and style-adaptive visual storytelling.

Abstract: In today’s visually driven digital culture, many rich narratives ranging from ancient folk tales to personal memories and imaginative ideas often remain confined to text, limiting their reach and experiential impact. Enabling users to visualize these stories as immersive visual narratives can inspire creativity, preserve cultural heritage, and engage younger, media savvy audiences. We introduce an agentic AI framework that transforms such rich texts into fully illustrated, style-consistent comic panels, enabling end-to-end visual storytelling from natural language. The system accepts user-provided inputs including the story, genre, artistic style, and desired panel count. In the absence of any of these, dedicated agents automatically infer the narrative mood, assign thematic tags, suggest a visual style, and segment the story into coherent scenes. The architecture is composed of modular agents orchestrated using LangChain responsible for metadata extraction, narrative decomposition, prompt engineering, and image generation. Leveraging LLMs, Stable Diffusion XL, the system generates and stylizes story panels based on detailed visual prompts. These panels are composed with consistency in character identity and setting maintained throughout the narrative. Designed with modularity and extensibility in mind, the framework supports multilingual storytelling, artistic style adaptation, and scalable deployment. Potential applications span digital storytelling, education, visual media, and cultural preservation.

Submission Number: 9

Loading