Can Large Language Models Automate the Refinement of Cellular Network Specifications?

ACL ARR 2026 January Submission147 Authors

22 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: language model, application, cellular network
Abstract: Cellular networks, e.g., 4G/5G, rely on complex technical specifications to ensure correct functionality; however, these specifications often contain flaws or ambiguities. In this paper, we investigate the application of Large Language Models for \textit{automated cellular network specification refinement}. We identify Change Requests, which record specification revisions, as a key source of domain-specific data and formulate refinement as three complementary sub-tasks. We introduce CR-Eval, a benchmark of 200 security-related test cases, and evaluate 17 open-source and 14 proprietary models. The best-performing model, \texttt{GPT-o3-mini}, identifies weaknesses in over 127 test cases within five trials. We further study LLM specialization, showing that fine-tuning an 8B model can outperform advanced LLMs such as \texttt{DeepSeek-R1} and \texttt{Qwen3-235B}. Evaluations on 30 real-world cellular attacks demonstrate the practical impact and remaining challenges. The codebase and benchmark are available at \url{https://anonymous.4open.science/r/CR-Eval}.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: applications, security/privacy
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: English
Submission Number: 147
Loading