Abstract: Language models can generate lists of salient
literary characters for specific relations but
struggle with long, complete lists spanning entire novels. This paper studies the non-standard
setting of extracting complete entity lists from
full-length books, such as identifying all 50+
friends of Harry Potter across the 7-volume
book series. We construct a benchmark dataset
with meticulously compiled ground-truth, posing it as a challenge for the research community.
We present a first-cut method to tackle this task,
based on RAG with LLMs. Our method introduces the novel contribution of harnessing IRstyle pseudo-relevance feedback for effective
passage retrieval from literary texts. Experimental results show that our approach clearly
outperforms both LLM-only and standard RAG
baselines, achieving higher recall while maintaining acceptable precision.
Loading