Weak2Wise: An Automated, Lightweight Framework for Weak-LLM-Friendly Reasoning Synthesis
Abstract: Recent advances in large language model
(LLM) fine-tuning have shown that training
data augmented with high-quality reasoning
traces can remarkably improve downstream performance. However, existing approaches usually rely on expensive manual annotations or
auxiliary models, and fail to address the unique
constraints of smaller “weak” LLMs. To bridge
these gaps, we introduce Weak2Wise, a fully
automated, lightweight framework for synthesizing high-quality, weak-LLM-friendly reasoning traces. Starting from a QA dataset,
Weak2Wise filters out the samples that can already be correctly answered by the weak LLM,
gathers diverse candidate reasoning traces
from multiple strong LLMs, and leverages our
Step-Mask scoring to rank and truncate the
most guidance-effective traces. These reasoning traces are then used for fine-tuning, yielding
substantial improvements in the weak LLM’s
reasoning abilities. The name Weak2Wise has
two meanings: using a “weak” LLM to select the "wisest" reasoning traces generated
by stronger LLMs, and fine-tuning the same
weak LLM on these reasoning traces to become “wiser”. We further use Weak2Wise to
build GR-1K, a 1,000-sample math and science QA-reasoning dataset optimized for weak
LLMs, and fine-tune Qwen2.5-7B on it to create GR-7B, which achieves superior performance on AIME2024, MATH-500, and GPQA
Diamond benchmarks. Our codes are publicly
released to facilitate further research.
Loading