Can Large Language Models Automate the Refinement of Cellular Network Specifications?

Can Large Language Models Automate the Refinement of Cellular Network Specifications?

ACL ARR 2026 January Submission147 Authors

22 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: language model, application, cellular network

Abstract: Cellular networks, e.g., 4G/5G, rely on complex technical specifications to ensure correct functionality; however, these specifications often contain flaws or ambiguities. In this paper, we investigate the application of Large Language Models for \textit{automated cellular network specification refinement}. We identify Change Requests, which record specification revisions, as a key source of domain-specific data and formulate refinement as three complementary sub-tasks. We introduce CR-Eval, a benchmark of 200 security-related test cases, and evaluate 17 open-source and 14 proprietary models. The best-performing model, \texttt{GPT-o3-mini}, identifies weaknesses in over 127 test cases within five trials. We further study LLM specialization, showing that fine-tuning an 8B model can outperform advanced LLMs such as \texttt{DeepSeek-R1} and \texttt{Qwen3-235B}. Evaluations on 30 real-world cellular attacks demonstrate the practical impact and remaining challenges. The codebase and benchmark are available at \url{https://anonymous.4open.science/r/CR-Eval}.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: applications, security/privacy

Contribution Types: Model analysis & interpretability, Data resources

Languages Studied: English

Submission Number: 147

Loading