Training Code LLMs for Low-Resource and Proprietary Languages via Multi Granular Instruction Tuning and AI Feedback

Training Code LLMs for Low-Resource and Proprietary Languages via Multi Granular Instruction Tuning and AI Feedback

ACL ARR 2026 January Submission3370 Authors

04 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: self-supervised learning, low-resource languages, domain-specific languages, coding LLMs, LLM-as-a-judge

Abstract: Large Language Models (LLMs) have significantly advanced automatic code generation in mainstream programming languages but underperform on low-resource, proprietary Domain Specific Languages (DSLs) due to limited training data. We introduce a fully automated, two-stage training pipeline for coding LLMs. Stage 1 performs supervised fine-tuning (SFT) using multi granular instructions synthesized directly from raw code and comments. Stage 2 performs reinforcement learning with AI feedback (RLAIF) by generating new instructions, and subsequently performing direct preference optimization (DPO) by choosing a preferred output. We evaluate our approach using parser success, and similarity metrics, complemented with an LLM-as-a-judge approach. Our results suggest that this pipeline can enable effective adaptation of LLMs to low-resource and proprietary DSLs under realistic data and evaluation constraints.

Paper Type: Long

Research Area: Machine Learning for NLP

Research Area Keywords: self-supervised learning, generative models, transfer learning / domain adaptation, reinforcement learning

Contribution Types: NLP engineering experiment, Approaches to low-resource settings

Languages Studied: Out-Of-Distribution Domain Specific Language (OOD-DSL), Python, Rust, Go

Submission Number: 3370

Loading