Non-linear Audio-Visual Storytelling from Scanned Comics: A Character-Centric Approach

Ansh Kushwaha, Sandeep Khanna, Lenin Khangjrakpam, Chiranjoy Chattopadhyay, Gaurav Bhatnagar

Published: 01 Jan 2026, Last Modified: 29 Nov 2025CrossrefEveryoneRevisionsCC BY-SA 4.0

Abstract: We introduce a framework for transform static comic scans into non-linear, character-focused audiovisual narratives using automated analysis and synthesis. Our hierarchical panel and speech bubble segmentation algorithm incorporating spatial and semantic features. A character relationship graph and an emotion trajectory modeling system are proposed that facilitate dynamic narrative generation. A character-focused panel sequencing system allows nonlinear storytelling by adjusting the narrative flow to the user-chosen character. We present a system that synchronizes text animations in speech bubbles with audio for coherent audiovisual sequences. Experiments on benchmark datasets demonstrate the effectiveness of the framework in panel segmentation accuracy, character identification precision, and the quality of audiovisual narratives generated with synchronized text animations. Quantitative and qualitative analyses show significant improvements over existing approaches in handling complex layouts and maintaining narrative coherence.

External IDs:doi:10.1007/978-3-032-09368-4_10