Universal Prompt Injection Techniques for Detecting LLM-based Assignment Fraud

Dorina Sîli; Bastian Küppers; Theodor Schnitzler

Universal Prompt Injection Techniques for Detecting LLM-based Assignment Fraud

Dorina Sîli, Bastian Küppers, Theodor Schnitzler

Published: 15 Oct 2025, Last Modified: 31 Oct 2025BNAIC/BeNeLearn 2025 OralEveryoneRevisionsBibTeXCC BY 4.0

Track: Type A (Regular Papers)

Keywords: Prompt Injection, CS Education

Abstract: Large Language Models (LLMs) such as ChatGPT, GitHub Copilot, and Gemini have introduced new challenges to academic integrity by enabling students to generate complete programming assignments with minimal effort. In our work, we investigate whether hidden prompt injection can be used to influence LLM behavior in hidden ways, with the goal of detecting or distinguishing AI-generated code from authentic student work. We tested a set of prompt injection techniques across multiple file formats and models, targeting three behavioral objectives: refusal to solve, subtle error insertion, and excessive commenting. Results show that visible, semantically embedded instructions, particularly those delivered through code comments, successfully triggered model-specific behaviors, while injections relying on encoded hyperlinks or metadata were uniformly ignored. Also, invisibility remained a key limitation, as alignment mechanisms frequently caused models to disclose or explain injected behavior. These findings suggest that prompt injection can serve as a possible method for embedding consistent output patterns, but further refinement is needed to achieve both universality and invisibility in educational detection scenarios.

Serve As Reviewer: ~Theodor_Schnitzler1

Submission Number: 68

Loading