Manga109Dialog: A Large-Scale Dialogue Dataset for Comics Speaker Detection

Yingxuan Li, Kiyoharu Aizawa, Yusuke Matsui

Published: 01 Jan 2024, Last Modified: 25 Jul 2025ICME 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The expanding market for e-comics has driven the development of automated methods for analyzing comics. To enhance the machine’s understanding of comics, an automated method is essential for linking text in comics to characters that speak those words. In this study, we developed Manga109Dialog 1 , which is the world’s largest speaker-to-text annotation dataset for comics, containing 132,692 pairs. We proposed a novel deep learning-based method using scene graph generation models. To tailor the unique features of comics, we enhanced the performance by considering the frame reading order. Our experiments with Manga109Dialog show that our scene-graph-based approach outperforms existing methods, achieving a prediction accuracy of over 75%, thus establishing a robust benchmark for speaker detection in comics.