From Language to Logic: Unlocking General Reasoning by Training on Natural Language Inference

ACL ARR 2026 January Submission9994 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Natural Language Inference (NLI), textual entailment, verifiable reward RL (RLVR), logic-first post-training, Group Relative Policy Optimization (GRPO), logic-centric reward, NLI paradox, cross-domain transfer, general reasoning, math reasoning, code generation, token-efficient reasoning
Abstract: Reinforcement learning with verifiable rewards (RLVR) has improved LLM reasoning in domains such as math and code, but the gains can be domain-specific and do not always transfer. We revisit Natural Language Inference (NLI) and observe an \textbf{NLI paradox}: despite massive pretraining, general-purpose LLMs can trail a strong DeBERTa-style NLI encoder by \textbf{about 7 points on average} and show weak separation between 'simple' and 'hard' NLI instances., signaling a failure to master fundamental logical relations. To bridge this gap, we recast NLI as a verifiable, generative reinforcement learning (RL) task. By optimizing Qwen-2.5-instruct models with Group Relative Policy Optimization (GRPO) and a logic-centric reward, we force the internalization of relational logical primitives. Our approach resolves the NLI Paradox, achieving a \textbf{12\% performance leap} and surpassing DeBERTa baselines. Most notably, pure NLI training exhibits powerful \textbf{cross-domain transfer}: without any domain-specific data, it yields average gains of \textbf{+3.6\%} in math, \textbf{+2.4\%} in code, and \textbf{+1.3\%} on general reasoning benchmarks, with a remarkable \textbf{+26.5\%} boost on MATH500 (3B). Furthermore, NLI-trained models generate \textbf{7.4\% fewer tokens}, demonstrating a more efficient and compact reasoning style. Our results suggest that NLI-based RL strengthens universal meta-reasoning skills---such as consistency checking and evidence integration---enabling broader cognitive transfer than traditional math-centric post-training.
Paper Type: Long
Research Area: Mathematical, Symbolic, Neurosymbolic, and Logical Reasoning
Research Area Keywords: Natural language inference, Reinforcement learning, Mathematical reasoning, Logical reasoning, Generalization
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Submission Number: 9994
Loading