Specializing Language Models for 3GPP Standards: Enhancements for Technical Document Queries

Gary C. F. Lee, Derek Khu, Feri Guretno, Ernest Kurniawan

Published: 2024, Last Modified: 07 Nov 2025GLOBECOM (Workshops) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper presents a novel approach to enhancing open-source language models for querying Third Generation Partnership Project (3GPP)-related technical documents, utilizing multiple-choice questions from the TeleQnA dataset as part of an International Telecommunication Union (ITU) Artificial Intelligence/Machine Learning (AI/ML) in 5G Challenge.1 Our primary focus is on the Phi-2 model, demonstrating that the integration of appropriately designed Retrieval-Augmented Generation (RAG), prompt engineering, and fine-tuning significantly enhances performance in handling complex technical standards-related queries. Our methodology leverages natural language processing techniques and re-ranking strategies, optimization of prompt ordering, and model fine-tuning. With our proposed methodology, we achieve an accuracy of 79.65% on a held-out test set based on TeleQnA. We address the challenges associated with adapting small language models to domain-specific tasks, offering insights into effective techniques for improving model performance within a resource-constrained setting. This research contributes to the field of telecommunications and language modeling, offering practical implications for future research and applications in this domain.

External IDs:dblp:conf/globecom/LeeKGK24