Keywords: Large Language Models, Multi-Agent Systems
Abstract: Large language models (LLMs) have recently advanced reasoning in multi-agent
systems (MAS), yet existing work mainly focuses on improving forward reasoning accuracy, overlooking the potential of adversarial mechanisms with backward
generation of erroneous reasoning chains to enhance both accuracy and stability.
We propose a novel adversarial learning framework in which a forward generator
produces accurate reasoning chains, while a backward generator constructs adversarial erroneous chains. Guided by a discriminator providing gradient feedback in
the textual domain, both generators iteratively refine their outputs through competitive optimization with generative adversarial networks (GANs). This competitive
optimization reduces variability in outputs for identical queries, increases robustness to prompt perturbations, and provides interpretability into the distinct roles of
the two generators by dynamically tracking the evolution of reasoning chains. Experiments show that, after two to three rounds of prompt optimization, our method
improves reasoning accuracy from 73.7% to 81.6%, and reduces instability from
0.39 to 0.08. These results demonstrate the proposed framework’s ability to jointly
optimize accuracy and stability, and highlight the promise of adversarial forwardbackward mechanisms in advancing multi-agent reasoning systems.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 24707
Loading