Abstract: Collaboration between multiple Large Language Models (LLMs) has attracted significant attention for its potential to mitigate hallucinations and enhance reasoning capabilities. Previous approaches, such as multi-agent debate and decoding-time integration, either rely on highly capable models with strong self-reflection abilities or are limited to models sharing the same tokenizer. To address these limitations, we introduce PToco (Prefix-based Token-level Collaboration), a novel mechanism that enables effective collaboration among less capable LLMs, independent of tokenizer differences. PToco uses a prefix-grouping method to extract consensus among tokens with varying levels of granularity, ensuring coherent and robust token generation across multiple models. Experimental results on a series of reasoning tasks demonstrate that PToco significantly improves performance over individual models. Furthermore, this approach generalizes well across different quantities and sizes of participating models, providing a more flexible and efficient solution for multi-LLM ensembles.
Loading