Distilling System 2 into System 1

Published: 10 Oct 2024, Last Modified: 10 Oct 2024Sys2-Reasoning PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Distilling System 2, LLM generations
Abstract: Large language models (LLMs) can spend extra compute during inference to generate intermediate thoughts, which helps to produce better final responses. Since Chain-of-Thought \citep{CoT}, many such {\em System 2} techniques have been proposed such as Rephrase and Respond \citep{RaR}, System 2 Attention \citep{S2A} and Branch-Solve-Merge \citep{BSM}. In this work we investigate self-supervised methods to ``compile'' (distill) higher quality outputs from System 2 techniques back into LLM generations {\em without} intermediate reasoning token sequences, as this reasoning has been distilled into {\em System 1}. We show that several such techniques can be successfully distilled, resulting in improved results compared to the original System 1 performance, and with less inference cost than System 2. We posit that System 2 distillation will be an important feature of future continually learning AI systems, enabling them to focus System 2 capabilities on the reasoning tasks that they cannot yet do well.
Submission Number: 65
Loading