Can Speculative Sampling Accelerate ReAct Without Compromising Reasoning Quality?

Han Xu; Jingyang Ye; Yutong Li; Haipeng Chen

Can Speculative Sampling Accelerate ReAct Without Compromising Reasoning Quality?

Han Xu, Jingyang Ye, Yutong Li, Haipeng Chen

Published: 19 Mar 2024, Last Modified: 31 May 2024Tiny Papers @ ICLR 2024 PresentEveryoneRevisionsBibTeXCC BY 4.0

Keywords: large language models, speculative sampling, prompting paradigms, language model efficiency

TL;DR: This paper discusses integrating speculative sampling into the ReAct paradigm to enhance the efficiency of LLMs.

Abstract: Large language models (LLMs) are increasingly used as agents for interaction with external environments. These interplays are commonly facilitated through various prompting paradigms. However, such paradigms require extended interaction traces between the LLMs and the environment, resulting in low task-solving efficiency. In this work, we integrate speculative sampling (SpS) into the novel ReAct paradigm. In particular, we investigate speculative sampling’s impact on the efficiency of ReAct and the quality of reasoning tasks. Our evaluations using HotPotQA and FEVER datasets demonstrate that implementing speculative sampling alongside ReAct results in a 2.18x-2.62x acceleration compared to using ReAct alone, while only introducing a negligible impact on the reasoning abilities.

Supplementary Material: zip

Submission Number: 178

Loading