Verifiable LLM-Generated Text Detection via Projected Semantic-Structural Distributions

Verifiable LLM-Generated Text Detection via Projected Semantic-Structural Distributions

ACL ARR 2026 January Submission10819 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Machine-Generated Text Detection; Large language model; Model distribution estimation; Trustworthy AI

Abstract: The widespread deployment of large language models (LLMs) makes detecting LLM-Generated text a critical security task. Existing methods, primarily relying on output probabilities from proxy models or single semantic features, suffer from distribution misalignment and limited interpretability. We observe that machine-generated text exhibits a directionally consistent systematic translation relative to human-written text within the joint semantic-structural space. Accordingly, we propose ProSSD, a statistical framework utilizing supervised subspace learning to extract compact features and construct conditional semantic distributions based on syntactic structures. By employing a likelihood ratio test, we derive a modified Mahalanobis distance, weighted by the Wasserstein distance, as the discriminative metric. Experiments demonstrate ProSSD’s superior robustness and computational efficiency across cross-domain, cross-model, and adversarial scenarios. Furthermore, we reveal the phenomena of systematic semantic translation and semantic collapse in machine-generated text, offering interpretable statistical insights into LLM generation behaviors.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: NLP Applications;Interpretability and Analysis of Models for NLP;Language Modeling

Languages Studied: English.

Submission Number: 10819

Loading