Keywords: Large Language Models, Awareness, Multi-Agent Collaboration, Collaborative Reasoning, Debate
Abstract: As Large Language Models advance in reasoning and generation, interest in their collaborative potential has grown. This paper investigates agentic reasoning collectives, i.e., structured groups of LLMs, to solve awareness-focused tasks. We introduce AwareXtend, a benchmark evaluating introspective and social awareness across five dimensions: Capability, Mission, Emotion, Culture, and Perspective. Unlike existing benchmarks, it poses multi-dimensional, context-sensitive challenges to assess awareness-driven reasoning. We propose a collaboration strategy based on Peer Debate and compare it against a family of hierarchical methods that extend the Mixture-of-Agents (MoA) approach. Experiments with groups of LLMs ranging from 1B up to 14B parameters show that Peer Debate consistently outperforms individual models and MoA approaches. These results indicate that collaborative reasoning improves models’ performance on awareness-related tasks, suggesting that interaction can support more consistent and contextually informed behavior.
Submission Number: 3
Loading