Leveraging LLM for Detecting and Explaining LLM-generated Code in Python Programming Courses

Jeonghun Baek, Tetsuro Yamazaki, Akimasa Morihata, Junichiro Mori, Yoko Yamakata, Kenjiro Taura, Shigeru Chiba

Published: 01 Jan 2025, Last Modified: 19 Apr 2025SIGCSE (2) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: As large language models (LLMs) have become more advanced, generating code to solve exercises in programming courses has become significantly easier. However, this convenience raises the concern of over-reliance on these tools, potentially hindering students from developing independent coding skills. To address this concern, we introduce an LLM-based detector that not only detects LLM-generated code but also explains the reasons for its judgments. These reasons provide insight into the characteristics of LLM-generated code, enhancing transparency in the detection process. We evaluate the detector in an introductory Python programming course, achieving over 99% accuracy. Additionally, instructors manually reviewed the reasons provided by the detector and verified that 64.7% of reasons for classifying code as LLM-generated were appropriate. These reasons can also serve as feedback, helping students improve their coding skills by understanding the characteristics of expert-level LLM-generated code.