Speaker Identification and Dataset Construction Using LLMs: A Bilingual Case Study on Japanese and English Narratives
Speaker identification in narrative analysis is challenging due to complex dialogues, varying utterance patterns, and multiple characters with similar or ambiguous references. Accurately attributing utterances to the correct speakers is critical for understanding character interactions and the narrative structure. To address these challenges, this study proposes a collaborative approach between humans and Large Language Models (LLMs) for dataset construction in speaker identification tasks. The process begins by manually extracting utterances and assigning speaker names to a small subset of the data. This labeled subset is then used to prompt-tune the LLM, enabling it to label speakers across the dataset. Subsequent manual corrections ensure accuracy while minimizing costs. Additionally, a paraphrased dataset is constructed to handle situations with multiple correct answers. Evaluation results indicate that models with larger parameter sizes, particularly those instruction-tuned in Japanese, achieve high accuracy in speaker identification.