Enhance Reasoning for Large Language Models with Reinforcement Learning in the Game Werewolf

Shuang Wu; Liwen Zhu; Tao Yang; Shiweixu; QIANG FU; Yang Wei; Haobo Fu

Enhance Reasoning for Large Language Models with Reinforcement Learning in the Game Werewolf

Shuang Wu, Liwen Zhu, Tao Yang, Shiweixu, QIANG FU, Yang Wei, Haobo Fu

28 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: large language models, reinforcement learning, social deduction game

TL;DR: We presents an innovative framework that integrates Large Language Models (LLMs) with an external module to enhance the reasoning capabilities of LLM-based agents.

Abstract: Despite their success across a broad spectrum of general tasks, Large Language Models (LLMs) often underperform in domain-specific tasks not well-represented in their pre-training corpora. We introduce an innovative framework integrating general-purpose LLMs with an external \emph{Thinker} module to enhance the reasoning capabilities of LLM-based agents. Unlike augmenting LLMs with prompt engineering, our Thinker module directly accesses knowledge from domain databases and employs supervised or reinforcement learning (RL). We establish a reasoning hierarchy where LLMs handle intuitive System-1 tasks that are domain-agnostic, while the Thinker focuses on System-2 tasks that require complex logical analysis and domain-specific knowledge. Our framework is demonstrated through a 9-player Werewolf game that necessitates dual-system reasoning. We design a communication protocol between LLMs and the Thinker, then optimize the Thinker through online RL and refine it by imitation learning. Drawing from 18800 human games, this work also contributes to the largest dataset for social deduction games to date. Experiments show that GPT-3.5 and GPT-4, augmented with the Thinker, significantly improve in deductive reasoning, speech generation, and online gameplay evaluated by human players. Further, integrating a fine-tuned 6B Werewolf-specific LLM with the Thinker achieves performance on par with GPT-4.

Supplementary Material: zip

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 13369

Loading