CurveMark: Detecting AI-Generated Text via Probabilistic Curvature and Dynamic Semantic Watermarking

Published: 2025, Last Modified: 05 Jan 2026Entropy 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Large language models (LLMs) pose significant challenges to content authentication, as their sophisticated generation capabilities make distinguishing AI-produced text from human writing increasingly difficult. Current detection methods suffer from limited information capture, poor rate–distortion trade-offs, and vulnerability to adversarial perturbations. We present CurveMark, a novel dual-channel detection framework that combines probability curvature analysis with dynamic semantic watermarking, grounded in information-theoretic principles to maximize mutual information between text sources and observable features. To address the limitation of requiring prior knowledge of source models, we incorporate a Bayesian multi-hypothesis detection framework for statistical inference without prior assumptions. Our approach embeds imperceptible watermarks during generation via entropy-aware, semantically informed token selection and extracts complementary features from probability curvature patterns and watermark-specific metrics. Evaluation across multiple datasets and LLM architectures demonstrates 95.4% detection accuracy with minimal quality degradation (perplexity increase < 1.3), achieving 85–89% channel capacity utilization and robust performance under adversarial perturbations (72–94% information retention).
Loading