Training Code LLMs for Low-Resource and Proprietary Languages via Multi Granular Instruction Tuning and AI Feedback
Keywords: self-supervised learning, low-resource languages, domain-specific languages, coding LLMs, LLM-as-a-judge
Abstract: Large Language Models (LLMs) have significantly advanced automatic code generation in mainstream programming languages but underperform on low-resource, proprietary Domain Specific Languages (DSLs) due to limited training data.
We introduce a fully automated, two-stage training pipeline for coding LLMs. Stage 1 performs supervised fine-tuning (SFT) using multi granular instructions synthesized directly from raw code and comments. Stage 2 performs reinforcement learning with AI feedback (RLAIF) by generating new instructions, and subsequently performing direct preference optimization (DPO) by choosing a preferred output.
We evaluate our approach using parser success, and similarity metrics, complemented with an LLM-as-a-judge approach. Our results suggest that this pipeline can enable effective adaptation of LLMs to low-resource and proprietary DSLs under realistic data and evaluation constraints.
Paper Type: Long
Research Area: Machine Learning for NLP
Research Area Keywords: self-supervised learning, generative models, transfer learning / domain adaptation, reinforcement learning
Contribution Types: NLP engineering experiment, Approaches to low-resource settings
Languages Studied: Out-Of-Distribution Domain Specific Language (OOD-DSL), Python, Rust, Go
Submission Number: 3370
Loading