Revisiting Chain-of-Thought in Code Generation: Do Language Models Need to Learn Reasoning before Coding?
TL;DR: We propose that generating code first and then providing CoT as explanations can improve CodeLLM performance with Supervised Fine-Tuning.
Abstract: Large Language Models (LLMs) have demonstrated exceptional performance in code generation, becoming increasingly vital for software engineering and development. Recently, Chain-of-Thought (CoT) has proven effective for complex tasks by prompting LLMs to reason step-by-step and provide a final answer.
However, research on *how LLMs learn to reason with CoT data for code generation* remains limited.
In this work, we revisit classic CoT training, which typically learns reasoning steps before the final answer.
We synthesize a dataset to separate the CoT process from code solutions and then conduct extensive experiments to study how CoT works in code generation empirically.
We observe counterintuitive phenomena, suggesting that the traditional training paradigm may not yield benefits for code generation. Instead, training LLMs to generate code first and then output the CoT to explain reasoning steps for code generation is more effective.
Specifically, our results indicate that a 9.86% relative performance improvement can be achieved simply by changing the order between CoT and code. Our findings provide valuable insights into leveraging CoT to enhance the reasoning capabilities of CodeLLMs and improve code generation.
Lay Summary: Large language models (LLMs), such as ChatGPT, are increasingly used to assist programmers in generating code. One promising technique to improve their performance is Chain-of-Thought (CoT), where the model is guided to explain its reasoning step by step before providing an answer. This approach has worked well for solving complex logic problems, but does it help when the goal is to write code?
In our study, we explore how CoT reasoning influences code generation. We carefully designed new training data to separate the reasoning process from the final code output, allowing us to study how models learn to reason and code. Surprisingly, we found that the traditional approach—reasoning first, then generating code—does not significantly improve code quality. Reversing the order works better: if the model writes the code first and then explains its reasoning, performance improves significantly.
This insight challenges common assumptions about how to teach models to think and program and opens up new ways to make AI coding assistants more reliable and effective.
Link To Code: https://github.com/richardodliu/OpenSyntheticCC
Primary Area: Deep Learning->Large Language Models
Keywords: Code Generation, Supervised Fine-Tuning, Chain of thought, CodeLLM
Submission Number: 10969
Loading