# Belief-Calibrated Multi-Agent Consensus Seeking for Complex NLP Tasks
<img src=img/introduction.png width=50% />

A multi-agent system (MAS) enhances its capacity to solve complex natural language processing (NLP) tasks through collaboration among multiple agents, where consensus-seeking serves as a fundamental mechanism. However, existing consensus-seeking approaches typically rely on voting mechanisms to judge consensus, overlooking contradictions in system-internal beliefs that destabilize the consensus. Moreover, these methods often involve agents updating their results through indiscriminate collaboration with every other agent. Such uniform interaction fails to identify the optimal collaborators for each agent, hindering the emergence of a stable consensus. To address these challenges, we provide a theoretical framework for selecting optimal collaborators that maximize consensus stability. Based on the theorems, we propose the Belief-Calibrated Consensus Seeking (BCCS) framework to facilitate stable consensus via selecting optimal collaborators and calibrating the consensus judgment by system-internal beliefs. Experimental results on the MATH and MMLU benchmark datasets demonstrate that the proposed BCCS framework outperforms the best existing results by 2.23\% and 3.95\% of accuracy on challenging tasks, respectively.

## Requirements

```
Python==3.9.19
torch==2.4.0
vllm==0.6.1
transformers==4.44.2
openai==1.40.0
scipy==1.13.1
```

## Datasets

We conduct experiments on two benchmark datasets [MATH](https://huggingface.co/datasets/HuggingFaceTB/MATH) and [MMLU](https://people.eecs.berkeley.edu/~hendrycks/data.tar).
For each dataset, we sample groups of cases for evaluation.
Execute the following commands for sampling:
```
cd Datasets
# MATH
python math_sampling.py
# MMLU
python mmlu_sampling.py
```

## Evaluation
### MATH
Execute the following commands for evaluation on MATH:
```
cd MATH
# Multi-agent collaboration
python run_math.py
# Evaluation
python evaluate.py
```

### MMLU
Execute the following commands for evaluation on MMLU:
```
cd MMLU
# Multi-agent collaboration
python run_mmlu.py
# Evaluation
python evaluate.py
```

## Results
<img src=img/results.png width=80% />
