WirelessMathLM: Teaching Mathematical Reasoning for LLMs in Wireless Communications with Reinforcement Learning

WirelessMathLM: Teaching Mathematical Reasoning for LLMs in Wireless Communications with Reinforcement Learning

ACL ARR 2026 January Submission8355 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: large language models, mathematical reasoning, wireless communications, reinforcement learning, benchmark datasets

Abstract: Large Language Models (LLMs) struggle with specialized mathematical reasoning in domains governed by strict physical constraints, such as wireless communications. Progress is currently stifled by a lack of training-scale resources; existing datasets are either purely evaluative or insufficient in volume for robust adaptation. To bridge this gap, we present WirelessMathBench-XL, the first training-scale benchmark for wireless mathematics, comprising 4,027 problems derived from 970 state-of-the-art papers. We employ a rigorous construction pipeline combining automated extraction with human expert verification to ensure robust generalization assessment. The dataset features a hierarchical taxonomy consisting of multiple-choice, progressive fill-in-the-blank, and full equation completion tasks, explicitly designed to test comprehension depth. To validate the corpus, we train baseline models (WirelessMathLM) using reinforcement learning with verification rewards. Our 7B model achieves 39.5\% accuracy, rivaling GPT-4o (40.4\%) and demonstrating that the dataset provides sufficient signal for small models to master complex domain logic. Further analysis reveals that training on this specialized corpus improves performance on general mathematical benchmarks without catastrophic forgetting, while also showing consistent improvements across broad knowledge, science, and coding tasks.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: Evaluation, Mathematical Reasoning, Domain-specific NLP, Benchmark, Wireless Communications

Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources

Languages Studied: English

Submission Number: 8355

Loading