CodeChemist: Test-Time Scaling for Low-Resource Code Generation via Functional Knowledge Transfer

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM Code Generation, Test-time Scaling, Low-resource Code
Abstract: Code Large Language Models (CodeLLMs) have been widely adopted for code generation, powering applications with large user bases. Their performance, however, varies sharply across programming languages (PLs) and is particularly suboptimal for low-resource PLs due to data scarcity, limiting their overall usability. In this work, we introduce CodeChemist, a simple yet effective test-time scaling framework that transfers the model's functional knowledge from high-resource to low-resource PLs via synthesized test cases. Specifically, CodeChemist first performs code generation and execution in high-resource PLs to derive test cases that capture functional knowledge, then applies multi-temperature hedged sampling to produce candidate code snippets in the low-resource PL, and finally selects the best candidate by executing the synthesized test cases. Extensive experiments demonstrate that CodeChemist significantly outperforms existing test-time scaling methods, improving code generation for low-resource PLs without retraining.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 10919
Loading