The Deepfake Defense Stack: Why No Single Layer Works and How They Must Compose

05 Apr 2026 (modified: 24 Apr 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Every defense against AI-synthesized media, whether passive detection, invisible watermarking, or content provenance, has been shown to fail when deployed in isolation. Detectors suffer 45-50% accuracy degradation from laboratory to deployment and collapse on outputs from unseen generator architectures. Watermarks are removable by regeneration attacks and screenshot capture. Provenance metadata is stripped by most social media platforms. Yet no prior work has formally analyzed how these defenses compose: which attack classes each layer blocks, where cascade failures propagate, and what residual vulnerabilities survive the full stack. We present the first composition analysis of deepfake defenses. Through a Defense Composition Matrix covering 58 detection methods, 23 proactive defense systems, and 7 adversarial attack classes, we map the interaction between three defense layers (detection, watermarking, provenance) and seven attack classes. We identify two attack classes that penetrate all three layers simultaneously, one that bypasses the primary trust layer, and three emergent composition patterns where stacking defenses creates vulnerabilities absent from any individual layer. We formulate a Detection Ceiling Conjecture arguing, with supporting evidence, that post-hoc detection faces an information-theoretic bound that provenance-based approaches do not share. Our composition analysis draws on 190 papers spanning generation, detection, proactive defense, adversarial attacks, benchmarks, and the societal impact of AI-synthesized media across the 2014-2026 period. We provide focused technical background for each defense layer (Sections 3-5) sufficient to support the composition analysis; readers seeking comprehensive coverage of individual layers should consult the dedicated surveys cited in Table 1. We identify eight open problems with falsifiable hypotheses and proposed experimental protocols.
Submission Type: Long submission (more than 12 pages of main content)
Changes Since Last Submission: This revision corrects inconsistencies and adds minor content updates. - Abstract corrected to match the Defense Composition Matrix - Fraud telemetry updated with cumulative figure (Surfshark, April 2026) in Sections 1 and 10.1 - Sora timeline corrected with accurate timeline - Dual-use acknowledgment added to Section 10.3, addressing the risk that the Defense Composition Matrix could serve as an adversarial blueprint No structural, theoretical, or methodological changes were made.
Assigned Action Editor: ~Fernando_Perez-Cruz1
Submission Number: 8271
Loading