NovelProbe: Novelty Detection of Machine-Generated Texts

ACL ARR 2026 January Submission6910 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Novelty Detection; Plagiarism Detection; Membership Inference Attack; Pre-Training Data Detection;
Abstract: Large language models can inadvertently reproduce memorized training passages, raising concerns of plagiarism, privacy, and deployment safety. Prior work on membership inference and pre-training data detection largely relies on surface likelihood signals, which fail to transfer to auditing settings involving machine-generated text. We present NovelProbe, a novelty detector that predicts whether a model’s generation reflects memorization or novel synthesis using pre-decode hidden activations. Since ground-truth pre-training membership is typically unavailable, we instead train our detector on the related classification task of in-context memorization. Via experiments on multiple open-weight models, we determine that a detector trained on in-context memorization provides a strong and necessary signal for novelty detection, while also maintaining effectiveness on membership inference detection.
Paper Type: Short
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: data influence; knowledge tracing/discovering/inducing; probing;
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources
Languages Studied: English
Submission Number: 6910
Loading