Adapting Language Models for Low-Resource Programming Languages

Ananya Singha; Mukul Singh; Hosein Hasanbeig; Arjun Radhakrishna; Sumit Gulwani

Adapting Language Models for Low-Resource Programming Languages

Ananya Singha, Mukul Singh, Hosein Hasanbeig, Arjun Radhakrishna, Sumit Gulwani

Published: 22 Sept 2025, Last Modified: 25 Nov 2025DL4C @ NeurIPS 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: low-resource programming languages, code-generation, evaluation

TL;DR: We explore how to adapt LLMs for low-resource programming languages by evaluating four key techniques—RAG, agent-based reasoning, tool & feedback; for improving code generation in LRL

Abstract: Large Language Models (LLMs) have achieved remarkable success in code generation, yet their capabilities remain predominantly concentrated in well-resourced programming languages such as Python and Java. In contrast, low-resource programming languages present a significant challenge due to limited available data and unique syntax features. In this paper, we systematically implement and evaluate four core adaptation techniques (retrieval-augmented generation, agentic architectures, tool calling and feedback guided generation) to understand how these models can be better improved for underrepresented programming languages. Our findings reveal that tool calling is particularly effective for low-resource languages, outperforming its performance on high-resource counterparts. Conversely, high-resource languages show a stronger preference for agentic workflows and RAG, likely due to the models' deeper familiarity and pretraining exposure to these languages.

Submission Number: 85

Loading