WirelessMathLM: Teaching Mathematical Reasoning for LLMs in Wireless Communications with Reinforcement Learning
Keywords: large language models, mathematical reasoning, wireless communications, reinforcement learning, benchmark datasets
Abstract: Large Language Models (LLMs) struggle with specialized mathematical reasoning in domains governed by strict physical constraints, such as wireless communications. Progress is currently stifled by a lack of training-scale resources; existing datasets are either purely evaluative or insufficient in volume for robust adaptation. To bridge this gap, we present WirelessMathBench-XL, the first training-scale benchmark for wireless mathematics, comprising 4,027 problems derived from 970 state-of-the-art papers. We employ a rigorous construction pipeline combining automated extraction with human expert verification to ensure robust generalization assessment. The dataset features a hierarchical taxonomy consisting of multiple-choice, progressive fill-in-the-blank, and full equation completion tasks, explicitly designed to test comprehension depth. To validate the corpus, we train baseline models (WirelessMathLM) using reinforcement learning with verification rewards. Our 7B model achieves 39.5\% accuracy, rivaling GPT-4o (40.4\%) and demonstrating that the dataset provides sufficient signal for small models to master complex domain logic. Further analysis reveals that training on this specialized corpus improves performance on general mathematical benchmarks without catastrophic forgetting, while also showing consistent improvements across broad knowledge, science, and coding tasks.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: Evaluation, Mathematical Reasoning, Domain-specific NLP, Benchmark, Wireless Communications
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources
Languages Studied: English
Submission Number: 8355
Loading