Logit Arithmetic Elicits Long Reasoning Capabilities Without Training

Logit Arithmetic Elicits Long Reasoning Capabilities Without Training

ACL ARR 2025 July Submission961 Authors

29 Jul 2025 (modified: 20 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large reasoning models (LRMs) can do complex reasoning via long chain-of-thought (CoT) involving cognitive strategies such as backtracking and self-correction. Recent studies suggest that some models inherently possess these long reasoning abilities, which may be unlocked via extra training. Our work investigates whether we can elicit such behavior without any training. To that goal, we propose a decoding-time approach, ThinkLogit, which utilizes logits arithmetic to tune a target large LM for long reasoning using a substantially smaller model as guider. We then show that we can further boost its performance by training the guider model with preference optimization over correct/incorrect reasoning pairs sampled from both the target and guider model---a setup we refer to as ThinkLogit-DPO. Our experiments demonstrate that ThinkLogit and ThinkLogit-DPO achieve a relative improvement in pass@1 by 24.5% and 29.1%, respectively, over five mathematical and scientific reasoning datasets using the Qwen2.5-32B when guided by R1-Distill-Qwen-1.5B---a model 21x smaller. Ablation studies confirm that ThinkLogit-DPO succeeds only when it couples a preference‑learning objective with training pairs drawn from both the target and guider models. Our work presents a computationally-efficient method to elicit long reasoning in large models with minimal or no additional training.

Paper Type: Long

Research Area: Question Answering

Research Area Keywords: reasoning, math QA

Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency

Languages Studied: English

Reassignment Request Area Chair: This is not a resubmission

Reassignment Request Reviewers: This is not a resubmission

A1 Limitations Section: This paper has a limitations section.

A2 Potential Risks: N/A

B Use Or Create Scientific Artifacts: Yes

B1 Cite Creators Of Artifacts: Yes

B1 Elaboration: Section 3.1

B2 Discuss The License For Artifacts: No

B2 Elaboration: We use publicly available datasets and open-sourced models that are widely adopted in academic research. All artifacts used in this work are governed by their respective licenses, which permit research use and distribution.

B3 Artifact Use Consistent With Intended Use: No

B3 Elaboration: We use publicly available datasets and open-sourced models that are widely adopted in academic research. All artifacts used in this work are governed by their respective licenses, which permit research use and distribution.

B4 Data Contains Personally Identifying Info Or Offensive Content: No

B4 Elaboration: The datasets contain problems from mathematical competitions and science literature, with no risk of containing personal info or offensive content.

B5 Documentation Of Artifacts: Yes

B5 Elaboration: Section 3.1

B6 Statistics For Data: Yes

B6 Elaboration: Section 3.1

C Computational Experiments: Yes

C1 Model Size And Budget: Yes

C1 Elaboration: Section 3.1

C2 Experimental Setup And Hyperparameters: Yes

C2 Elaboration: Section 3.1, Appendix A.1

C3 Descriptive Statistics: Yes

C3 Elaboration: Section 3.2

C4 Parameters For Packages: Yes

C4 Elaboration: Appendix A.1

D Human Subjects Including Annotators: No

D1 Instructions Given To Participants: N/A

D2 Recruitment And Payment: N/A

D3 Data Consent: N/A

D4 Ethics Review Board Approval: N/A

D5 Characteristics Of Annotators: N/A

E Ai Assistants In Research Or Writing: Yes

E1 Information About Use Of Ai Assistants: Yes

E1 Elaboration: We use AI assistants for debugging code and polish writing in the draft.

Author Submission Checklist: yes

Submission Number: 961

Loading