Keywords: AI in Education, Document Watermarking, Authorship Attribution, Assessment Security, LLM Agents, Academic Integrity, Benchmark
Abstract: Multimodal large language models (MLLMs) can directly consume exam documents, threatening conventional assessments and academic integrity. We present DOPE (Decoy-Oriented Perturbation Encapsulation), a document-layer defense framework that embeds semantic decoys into PDF and HTML assessments to exploit render–parse discrepancies in MLLM pipelines. By instrumenting exams at authoring time, DOPE provides model-agnostic prevention—confounding or preventing automated solving—and detection—flagging blind AI reliance—without relying on conventional one-shot classifiers. We formalize both prevention and detection tasks and introduce FEWSORT-Q, an LLM-guided pipeline for generating question-level semantic decoys, along with FEWSORT-D to encapsulate them into watermarked documents. We evaluate on INTEGRITY-BENCH, a paired benchmark of 1,826 exams (PDF and HTML) derived from public QA datasets and OpenCourseWare. Against black-box MLLMs from OpenAI and Anthropic, DOPE achieves strong empirical results: a 91.4% detection rate at an 8.7% false-positive rate using an LLM-as-judge verifier, and prevents successful completion or induces decoy-aligned failures in 96.3% of attempts. We release INTEGRITY-BENCH, our toolkit, and evaluation code to enable reproducible research on document-layer defenses for academic integrity.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: AI in Education and Academic Integrity.
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: English
Submission Number: 10729
Loading