Keywords: large language models, speculative sampling, prompting paradigms, language model efficiency
TL;DR: This paper discusses integrating speculative sampling into the ReAct paradigm to enhance the efficiency of LLMs.
Abstract: Large language models (LLMs) are increasingly used as agents for interaction with external environments. These interplays are commonly facilitated through various prompting paradigms. However, such paradigms require extended interaction traces between the LLMs and the environment, resulting in low task-solving efficiency. In this work, we integrate speculative sampling (SpS) into the novel ReAct paradigm. In particular, we investigate speculative sampling’s impact on the efficiency of ReAct and the quality of reasoning tasks. Our evaluations using HotPotQA and FEVER datasets demonstrate that implementing speculative sampling alongside ReAct results in a 2.18x-2.62x acceleration compared to using ReAct alone, while only introducing a negligible impact on the reasoning abilities.
Supplementary Material: zip
Submission Number: 178
Loading