Adapting Language Models for Low-Resource Programming Languages

Adapting Language Models for Low-Resource Programming Languages

ACL ARR 2025 May Submission5568 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large Language Models (LLMs) have achieved remarkable success in code generation, yet their capabilities remain predominantly concentrated in well-resourced programming languages such as Python and Java. In contrast, low-resource programming languages present a significant challenge due to limited available data and unique syntax features. In this paper, we systematically implement and evaluate four core adaptation techniques (retrieval-augmented generation, agentic architectures, tool calling and feedback guided generation) to understand how these models can be better improved for underrepresented programming languages. Our findings reveal that tool calling is particularly effective for low-resource languages, outperforming its performance on high-resource counterparts. Conversely, high-resource languages show a stronger preference for agentic workflows and RAG, likely due to the models' deeper familiarity and pretraining exposure to these languages.

Paper Type: Short

Research Area: Resources and Evaluation

Research Area Keywords: low-resource programming languages, code-generation, evaluation

Contribution Types: Model analysis & interpretability, Approaches to low-resource settings

Languages Studied: English

Submission Number: 5568

Loading