One-Pass to Reason: Token Duplication and Block-Sparse Mask for Efficient Fine-Tuning on Multi-Turn Reasoning
Abstract: Fine-tuning Large Language Models(LLMs) on multi-turn reasoning datasets requires N (number of turns) separate forward passes per conversation due to reasoning token visibility constraints, as reasoning tokens for a turn are discarded in subsequent turns. We propose duplicating response tokens along with a custom attention mask to enable single-pass processing of entire conversations. We prove our method produces identical losses to the N-pass approach while reducing time complexity from $O\bigl(N^{3}\bigl)$ to $O\bigl(N^{2}\bigl)$ and maintaining the same memory complexity for a transformer-based model. Our approach achieves significant training speedup while preserving accuracy. Our implementation is available at https://anonymous.4open.science/r/one-pass-to-reason-F6CB/.
Paper Type: Short
Research Area: Special Theme (conference specific)
Research Area Keywords: Advanced Reasoning and Chain-of-Thought in Language Models
Contribution Types: Approaches low compute settings-efficiency
Languages Studied: English
Reassignment Request Area Chair: This is not a resubmission
Reassignment Request Reviewers: This is not a resubmission
Data: zip
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: No
A2 Elaboration: Our contribution is an efficiency‐only training technique applied to fully synthetic, non-sensitive data (section 2 & 3.1); we therefore do not identify novel societal or safety risks beyond those already known for generic LLM fine-tuning.
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: Section 3, 3.1
B2 Discuss The License For Artifacts: Yes
B2 Elaboration: Section 3
B3 Artifact Use Consistent With Intended Use: Yes
B3 Elaboration: Section 3
B4 Data Contains Personally Identifying Info Or Offensive Content: No
B4 Elaboration: The dataset comprises of synthetically generated math content with no inclusion of private or personal data, so it does not include confidential information.
B5 Documentation Of Artifacts: Yes
B5 Elaboration: Appendix E.1
B6 Statistics For Data: Yes
B6 Elaboration: Appendix E.1
C Computational Experiments: Yes
C1 Model Size And Budget: Yes
C1 Elaboration: Section 3, Appendix E.3
C2 Experimental Setup And Hyperparameters: Yes
C2 Elaboration: Section 3, Appendix E.3
C3 Descriptive Statistics: Yes
C3 Elaboration: Section 3.3, Appendix E.4
C4 Parameters For Packages: Yes
C4 Elaboration: Section 3, Appendix E.3
D Human Subjects Including Annotators: No
D1 Instructions Given To Participants: N/A
D2 Recruitment And Payment: N/A
D3 Data Consent: N/A
D4 Ethics Review Board Approval: N/A
D5 Characteristics Of Annotators: N/A
E Ai Assistants In Research Or Writing: Yes
E1 Information About Use Of Ai Assistants: Yes
E1 Elaboration: Section 3.1 mentions our use of gpt-4.1-mini for dataset generation. We used chatGPT purely for wording suggestions and proof-reading the manuscript, and used the Cursor AI coding assistant for small refactor / boiler-plate suggestions in the code implementation. All AI-generated text or code was reviewed, edited, and validated by the authors before inclusion.
Author Submission Checklist: yes
Submission Number: 956
Loading