Keywords: case-based reasoning, few-shot prompting, large language models, code generation
TL;DR: Using a case-based reasoning approach, this paper demonstrates dynamic few-shot prompting outperforms static few-shot and zero-shot methods in LLM-based code generation, while also categorizing and analyzing seven prevalent failure types.
Abstract: Large language models have recently succeeded in various code generation tasks but still struggle with generating task plans for complex, real-world problems that need detailed, context-aware planning and execution. This work aims to enhance these models' accuracy in generating task plans from natural language instructions. These tasks plans, represented as python code, use custom functions to accomplish the user's request as specified in natural language. The task plans are multi-step, often include loops, and are executed in a python runtime environment. Our approach uses case-based reasoning to perform dynamic few-shot prompting to improve the large language models ability to accurately follow planning prompts. We compare the effectiveness of dynamic prompting with static three-shot and zero-shot prompting approaches finding that dynamic prompting improves the accuracy of the generated code. Additionally, we identify and discuss seven types of failures in code generation.
Submission Number: 8
Loading