AEyeDE: An Attention-Based Attribution Framework for AI-Generated Text Detection

AEyeDE: An Attention-Based Attribution Framework for AI-Generated Text Detection

ACL ARR 2026 January Submission10639 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, AI text detection, Author attribution, Explainable AI, Attention maps

Abstract: Detecting AI-generated text is becoming increasingly challenging as modern language models approach human-level fluency and can evade detectors that rely on surface statistics or likelihood-based signals. We propose AEyeED an attribution-driven approach to human-AI authorship detection that leverages model attention as a discriminative signal. Specifically, we extract attention-based attribution matrices for both human- and AI-generated text using a proxy Transformer model with white-box access and train a lightweight Convolutional Neural Network to learn representations from these attribution maps. Across standard benchmarks, our method achieves performance competitive with state-of-the-art detectors using both encoder-decoder machine translation and decoder-only open-ended generation settings. We further provide evidence that attention maps exhibit detectable recurring local structures whose relative frequency differs reliably between human and AI text across datasets and proxy models. We will make the code publicly available for future research.

Paper Type: Long

Research Area: Special Theme (conference specific)

Research Area Keywords: Interpretability and Analysis of Models for NLP:

Contribution Types: Model analysis & interpretability

Languages Studied: English, German, French, Arabic

Submission Number: 10639

Loading