Abstract: As large language models (LLMs) have become more advanced, generating code to solve exercises in programming courses has become significantly easier. However, this convenience raises the concern of over-reliance on these tools, potentially hindering students from developing independent coding skills. To address this concern, we introduce an LLM-based detector that not only detects LLM-generated code but also explains the reasons for its judgments. These reasons provide insight into the characteristics of LLM-generated code, enhancing transparency in the detection process. We evaluate the detector in an introductory Python programming course, achieving over 99% accuracy. Additionally, instructors manually reviewed the reasons provided by the detector and verified that 64.7% of reasons for classifying code as LLM-generated were appropriate. These reasons can also serve as feedback, helping students improve their coding skills by understanding the characteristics of expert-level LLM-generated code.
Loading