Keywords: In-Context Learning, Long CoT Reasoning, Large Language model;
Abstract: Recent advances in Large Reasoning Models (LRMs) highlight the importance of long chain-of-thought (CoT) reasoning for complex tasks. However, most existing methods rely post-training that tunes the model parameters, obscuring whether pre-trained models intrinsically possess such capabilities. We propose in-context learning (ICL) with long CoT demonstrations as a tuning-free approach to investigate this. Across Qwen 2.5 (7B, 32B) and DeepSeek V3 models on mathematical reasoning tasks, we demonstrate that ICL empowers base models to exhibit sophisticated long CoT behaviors like reflection and verification. Furthermore, it delivers performance gains (pass@1–pass@K) over direct generation, supporting the conjecture that base models possess inherent reasoning capabilities, but not fully leveraged by direct prompting. Furthermore, our in-depth analysis reveals that long CoT ICL not only improves accuracy on easy problems but also enables models to solve previously intractable medium problems. Finally, we validate that tasks benefit from long CoT ICL when problem-relevant demonstrations are provided. For instance, given problem-relevant demonstrations, the performance of DeepSeek V3 on AIME25 improves by 6.5\%. We hope this work could advance the understanding of the mechanisms and intrinsic abilities of long CoT reasoning.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 18552
Loading