DOPE: Decoy Oriented Perturbation Encapsulation

DOPE: Decoy Oriented Perturbation Encapsulation

ACL ARR 2026 January Submission10729 Authors

06 Jan 2026 (modified: 07 Jun 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: AI in Education, Document Watermarking, Authorship Attribution, Assessment Security, LLM Agents, Academic Integrity, Benchmark

Abstract: Multimodal large language models (MLLMs) can directly consume exam documents, threatening conventional assessments and academic integrity. We present DOPE (Decoy-Oriented Perturbation Encapsulation), a document-layer defense framework that embeds semantic decoys into PDF and HTML assessments to exploit render–parse discrepancies in MLLM pipelines. By instrumenting exams at authoring time, DOPE provides model-agnostic prevention—confounding or preventing automated solving—and detection—flagging blind AI reliance—without relying on conventional one-shot classifiers. We formalize both prevention and detection tasks and introduce FEWSORT-Q, an LLM-guided pipeline for generating question-level semantic decoys, along with FEWSORT-D to encapsulate them into watermarked documents. We evaluate on INTEGRITY-BENCH, a paired benchmark of 1,826 exams (PDF and HTML) derived from public QA datasets and OpenCourseWare. Against black-box MLLMs from OpenAI and Anthropic, DOPE achieves strong empirical results: a 91.4% detection rate at an 8.7% false-positive rate using an LLM-as-judge verifier, and prevents successful completion or induces decoy-aligned failures in 96.3% of attempts. We release INTEGRITY-BENCH, our toolkit, and evaluation code to enable reproducible research on document-layer defenses for academic integrity.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: AI in Education and Academic Integrity.

Contribution Types: NLP engineering experiment, Data resources

Languages Studied: English

Submission Number: 10729

Loading