Beyond Easy Wins: A Text Hardness-Aware Benchmark for LLM-generated Text Detection

Beyond Easy Wins: A Text Hardness-Aware Benchmark for LLM-generated Text Detection

ACL ARR 2025 May Submission2475 Authors

19 May 2025 (modified: 29 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: We present a novel evaluation paradigm for AI text detectors that prioritizes real-world and equitable assessment. Current approaches predominantly report conventional metrics like AUROC, overlooking that even modest false positive rates constitute a critical impediment to practical deployment of detection systems. Furthermore, real-world deployment necessitates predetermined threshold configuration, making detector stability (i.e. the maintenance of consistent performance across diverse domains and adversarial scenarios), a critical factor. These aspects have been largely ignored in previous research and benchmarks. Our benchmark, SHIELD, addresses these limitations by integrating both reliability and stability factors into a unified evaluation metric designed for practical assessment. Furthermore, we develop a post-hoc, model-agnostic humanification framework that modifies AI text to more closely resemble human authorship, incorporating a controllable hardness parameter. This hardness-aware approach effectively challenges current SOTA zero-shot detection methods in maintaining both reliability and stability. (Data and code will be released on GitHub upon acceptance.)

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: AI-text detection, Large language models, LLM-generated text detection, Text humanification

Contribution Types: Data resources

Languages Studied: English

Submission Number: 2475

Loading