Keywords: multi-agent, social intelligence, logical reasoning, complex and dynamic planning, complex and accurate rules
TL;DR: We provide an environment together with a benchmark on fusion of abilities on social and logical reasoning, via a multi-agent trade-and-craft game with complicated rules.
Abstract: Modern large language models (LLMs) demonstrate strong capabilities in planning and social reasoning when evaluated separately. However, solving problems in social environments typically requires the integration of both reasoning and social skills, posing a greater challenge. We present \textit{TradeCraft}, a flexible and extensible multi-agent environment that embeds strict reasoning and planning requirements into socially grounded tasks. \textit{TradeCraft} integrates trading, negotiation, and multi-step item crafting, supporting two rule sets: a Minecraft-inspired system, and a Little Alchemy 2–inspired system, each with about 1000 items and over 1000 formulas. The environment provides both a web-based GUI for human participation and a text-based API (compatible with \texttt{gymnasium}) for LLM agents, enabling diverse forms of human–AI and multi-agent interaction. To catalyze further research, we introduce a workflow-based LLM agent that leverages task-specific prompting and ReAct mechanisms for trading and crafting, while exhibiting configurable social preferences ranging from cooperative to competitive. We further conduct a nine-dimensional evaluation through self-play experiments, analyzing cooperation, goal alignment, information utilization, theory of mind, and other aspects across multiple LLMs and strategy-guidance settings. \tc is open source at `https://github.com/TradeCraft-team/TradeCraft`.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 16092
Loading