Abstract: Large Language Models (LLMs) often excel in specific domains but fall short in others due to the limitations of their training. Thus, enabling LLMs to solve problems collaboratively by integrating their complementary knowledge promises to improve their performance across domains. To realize this potential, we introduce a novel Collaborative Speculative Decoding (CoSD) algorithm that enables efficient LLM knowledge fusion at test time without requiring additional model training. CoSD employs a draft model to generate initial sequences and an easy-to-learn rule or decision tree to decide when to invoke an assistant model to improve these drafts. CoSD not only enhances knowledge fusion but also improves inference efficiency, is transferable across domains, and offers greater explainability. Experimental results demonstrate that CoSD improves accuracy by up to 10% across benchmarks compared to existing methods, providing a scalable and effective solution for LLM-based applications. Our code has been released at https://github.com/ATP-1010/CoSD.
Lay Summary: Large language models (LLMs) are good at many tasks, but each model often has strengths in different areas. This becomes a problem when one model alone can’t solve a task well because it lacks certain knowledge. To address this, we developed a new method called Collaborative Speculative Decoding (CoSD). It lets multiple LLMs work together during inference, without needing to retrain any of them. One model drafts a solution, and a lightweight rule decides when another model should step in to improve it. This approach is fast, works across domains, and is easy to interpret. We show that CoSD can improve accuracy by up to 10% on several benchmarks. Our work offers a practical way to combine the strengths of different models, making LLM-based systems more flexible and reliable.
Link To Code: https://github.com/ATP-1010/CoSD
Primary Area: Deep Learning->Large Language Models
Keywords: Large language model, Knowledge fusion, Speculative decoding
Submission Number: 7061
Loading