CrossThink: Scaling Self-Learning beyond Math Reasoning

CrossThink: Scaling Self-Learning beyond Math Reasoning

ACL ARR 2025 May Submission4653 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Prior work has successfully applied Reinforcement Learning (RL) to mathematical reasoning—where rules and correctness are well-defined. Yet, generalizing these methods to broader reasoning domains remains challenging due to limited data and the lack of verifiable rewards for unstructured domains. In this work, we propose CrossThink, a framework that systematically incorporates multi-domain corpora into RL training to improve generalization across diverse reasoning tasks. CrossThink addresses key challenges by (1) combining data from varied sources; (2) applying structured templates to control answer-space complexity; (3) filtering for verifiable answers; and (4) optimizing data blending strategies to utilize multi-source data effectively. This enables scalable and verifiable reward modeling beyond math and demonstrates improved accuracies on both math (MATH-500: +30.1\%, AMC23: +27.5\%) and non-math reasoning benchmarks (MMLU-Pro: +12.8\%, GPQA-Diamond: +11.3\%, AGIEval: +15.1\%, SuperGPQA: +3.8\%). Moreover, CrossThink exhibits significantly improved response efficiency—using 28\% fewer tokens for correct answers—highlighting more focused and effective reasoning. Through CrossThink, we demonstrate that integrating multi-domain, multi-format data in RL leads to more accurate, efficient, and generalizable LLMs.

Paper Type: Long

Research Area: Machine Learning for NLP

Research Area Keywords: reinforcement learning, reasoning, chain-of-thought, LLM

Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency, Data resources

Languages Studied: English

Submission Number: 4653

Loading