Track: Type A (Regular Papers)
Keywords: Prompt Injection, CS Education
Abstract: Large Language Models (LLMs) such as ChatGPT, GitHub Copilot, and Gemini have introduced new challenges to academic integrity by enabling students to generate complete programming assignments with minimal effort. In our work, we investigate whether hidden prompt injection can be used to influence LLM behavior in hidden ways, with the goal of detecting or distinguishing AI-generated code from authentic student work. We tested a set of prompt injection techniques across multiple file formats and models, targeting three behavioral objectives: refusal to solve, subtle error insertion, and excessive commenting. Results show that visible, semantically embedded instructions, particularly those delivered through code comments, successfully triggered model-specific behaviors, while injections relying on encoded hyperlinks or metadata were uniformly ignored. Also, invisibility remained a key limitation, as alignment mechanisms frequently caused models to disclose or explain injected behavior. These findings suggest that prompt injection can serve as a possible method for embedding consistent output patterns, but further refinement is needed to achieve both universality and invisibility in educational detection scenarios.
Serve As Reviewer: ~Theodor_Schnitzler1
Submission Number: 68
Loading