Abstract: While large language models (LLMs) have achieved strong results in mathematical reasoning, their ability to generalize reasoning skills across academic disciplines remains underexplored. This work investigates whether equipping models with diverse human reasoning patterns can improve cross-domain generalization.
We present RP-CD, a 2.28M instance dataset that annotates four reasoning patterns—verification, retrospection, decomposition, and reverse thinking—across diverse academic subjects. Based on RP-CD, we introduce Pattern Fine-Tuning (PFT), a method that injects explicit reasoning patterns into models enabling to internalize problem solving strategies.
Experiments with Qwen2.5-7B-Instruct demonstrate that PFT consistently outperforms supervised fine-tuning (SFT) across a range of benchmarks. On GSM8K, PFT-Mix improves accuracy by 5.7\% over Vanilla-SFT, with larger gains observed on MATH (+4.6\%) and BBH (+7.6\%). PFT-Mix also demonstrates strong cross-lingual transfer on C-Eval (+5.3\%) and improves instruction following performance on IFEval (+5.4\%). Furthermore, after reinforcement learning (RL), the PFT-Mix-RL model achieves 35.5\% accuracy on RP-CD, surpassing much larger models such as Qwen2.5-72B-Instruct (22.6\%) and DeepSeek-R1-Distill-Qwen-32B (21.7\%).
Our results highlight the value of reasoning patterns for enhancing cross-disciplinary and out-of-domain generalization. This suggests a practical path towards developing LLMs that are more robust in real-world, multi-domain applications.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: generalization,reinforcement learning,reasoning,free-text/natural language explanations
Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models, Data resources
Languages Studied: English, Chinese
Keywords: reasoning, generalization, reinforcement learning, free-text/natural language explanations
Submission Number: 1503
Loading