Comparing Large Language Models and Grammatical Evolution for Code Generation

Leonardo Lucio Custode, Chiara Camilla Rambaldi Migliore, Marco Roveri, Giovanni Iacca

Published: 2024, Last Modified: 25 Jan 2025GECCO Companion 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Code generation is one of the most valuable applications of AI, as it allows for automated programming and "self-building" programs. Both Large Language Models (LLMs) and evolutionary methods, such as Genetic Programming (GP) and Grammatical Evolution (GE), are known to be capable of performing code generation with reasonable performance. However, to the best of our knowledge, little work has been done so far on a systematic comparison between the two approaches. Most importantly, the only studies that conducted such comparisons used benchmarks from the GP community, which, in our opinion, may have provided possibly GP-biased results. In this work, we perform a comparison of LLMs and evolutionary methods, in particular GE, using instead a well-known benchmark originating from the LLM community. Our results show that, in this scenario, LLMs can solve significantly more tasks than GE, indicating that GE struggles to match the performance of LLMs on code generation tasks that have different properties from those commonly used in the GP community.