Keywords: Model Merging, Reasoning, Long-CoT
Abstract: Large Reasoning Models (LRMs) with long chain-of-thought reasoning have recently achieved remarkable success.
Yet, equipping domain-specialized models with such reasoning capabilities, referred to as "Reasoning + X", remains a significant challenge.
While model merging offers a promising training-free solution, existing methods often suffer from a destructive performance collapse: existing methods tend to both weaken reasoning depth and compromise domain-specific utility.
Interestingly, we identify a counter-intuitive phenomenon underlying this failure: \textit{reasoning ability predominantly resides in parameter regions with low gradient sensitivity, contrary to the common assumption that domain capabilities correspond to high-magnitude parameters}.
Motivated by this insight, we propose \textbf{ReasonAny}, a novel merging framework that resolves the {reasoning–domain performance collapse} through Contrastive Gradient Identification.
Experiments across safety, biomedicine, and finance domains show that ReasonAny effectively synthesizes "Reasoning + X" capabilities, significantly outperforming state-of-the-art baselines while retaining robust reasoning performance.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: applications, chain-of-thought, safety and alignment, biomedical QA, math QA
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Submission Number: 4655
Loading