Examining Inter-Consistency of Large Language Models Collaboration: An In-depth Analysis via Debate

Kai Xiong; Xiao Ding; Yixin Cao; Ting Liu; Bing Qin

Examining Inter-Consistency of Large Language Models Collaboration: An In-depth Analysis via Debate

Kai Xiong, Xiao Ding, Yixin Cao, Ting Liu, Bing Qin

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 FindingsEveryoneRevisionsBibTeX

Submission Type: Regular Long Paper

Submission Track: Interpretability, Interactivity, and Analysis of Models for NLP

Submission Track 2: Theme Track: Large Language Models and the Future of NLP

Keywords: Large Language Models, Inconsistency, Debate, Commonsense Reasoning

TL;DR: Conducting debates among multiple different LLMs to investigate the inter-consistency among LLMs.

Abstract: Large Language Models (LLMs) have shown impressive capabilities in various applications, but they still face various inconsistency issues. Existing works primarily focus on the inconsistency issues within a single LLM, while we complementarily explore the inter-consistency among multiple LLMs for collaboration. To examine whether LLMs can collaborate effectively to achieve a consensus for a shared goal, we focus on commonsense reasoning, and introduce a formal debate framework (FORD) to conduct a three-stage debate among LLMs with real-world scenarios alignment: fair debate, mismatched debate, and roundtable debate. Through extensive experiments on various datasets, LLMs can effectively collaborate to reach a consensus despite noticeable inter-inconsistencies, but imbalances in their abilities can lead to domination by superior LLMs. Leveraging a more advanced LLM like GPT-4 as an authoritative judge can boost collaboration performance. Our work contributes to understanding the inter-consistency among LLMs and lays the foundation for developing future collaboration methods. Codes and data are available at https://github.com/Waste-Wood/FORD.

Submission Number: 2405

Loading