Keywords: language model, application, cellular network
Abstract: Cellular networks, e.g., 4G/5G, rely on complex technical specifications to ensure correct functionality; however, these specifications often contain flaws or ambiguities.
In this paper, we investigate the application of Large Language Models for \textit{automated cellular network specification refinement}.
We identify Change Requests, which record specification revisions, as a key source of domain-specific data and formulate refinement as three complementary sub-tasks.
We introduce CR-Eval, a benchmark of 200 security-related test cases, and evaluate 17 open-source and 14 proprietary models.
The best-performing model, \texttt{GPT-o3-mini}, identifies weaknesses in over 127 test cases within five trials.
We further study LLM specialization, showing that fine-tuning an 8B model can outperform advanced LLMs such as \texttt{DeepSeek-R1} and \texttt{Qwen3-235B}.
Evaluations on 30 real-world cellular attacks demonstrate the practical impact and remaining challenges.
The codebase and benchmark are available at \url{https://anonymous.4open.science/r/CR-Eval}.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: applications, security/privacy
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: English
Submission Number: 147
Loading