Keywords: multi-agent LLMs, LLM agents, creative writing, creativity evaluation, peer review, distributed critique, agent interaction, novelty, science fiction generation, LLM-as-a-judge
Abstract: Large Language Models (LLMs) often struggle with creative generation, and multi-agent frameworks that improve reasoning through interaction can paradoxically hinder creativity by inducing content homogenization. We introduce LLM Review, a peer-review-inspired framework implementing Blind Peer Review: agents exchange targeted feedback while revising independently, preserving divergent creative trajectories. To enable rigorous evaluation, we propose SciFi-100, a science fiction writing dataset with a unified framework combining LLM-as-a-judge scoring, human annotation, and rule-based novelty metrics. Experiments demonstrate that LLM Review consistently outperforms multi-agent baselines, and smaller models with our framework can surpass larger single-agent models, suggesting interaction structure may substitute for model scale.
Paper Type: Long
Research Area: AI/LLM Agents
Research Area Keywords: multi-agent systems, LLM agents, agent interaction, coordination, role-playing agents
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 7830
Loading