Keywords: Small Language Models, Compiler Auto-Parallelization, Heterogeneous Systems, LLM-driven Optimization, Code Generation, Reasoning Strategies
TL;DR: This paper shows that small language models can act as compiler experts for auto-parallelization. Using reasoning strategies like Tree of Thoughts, they outperform traditional compilers, achieving an average 6.81x speedup on real-world code.
Abstract: Traditional auto-parallelizing compilers, which depend on rigid heuristics, face challenges with the complexity of modern heterogeneous systems. This paper introduces a detailed evaluation of auto-parallelization driven by small (1B parameter) Language Models (LLMs) for compilers. We assess three models-gemma3, llama3.2, and qwen2.5 employing six reasoning strategies on 11 real-world kernels from scientific computing, graph algorithms, and machine learning. Our system is compared against strong compiler baselines such as LLVM Polly, TVM, and Triton. Across 376 evaluations, our LLM-driven method achieves an average speedup of 6.81x and a maximum performance of 43.25x on convolution operations. We examine scalability, confirm correctness using multiple sanitizers, and validate robustness across various compilers and hardware. Our results show that small, efficient LLMs can act as effective reasoning engines for intricate compiler optimization tasks.
Submission Number: 7
Loading