Keywords: Large Language Models, AI-text Detection, Paraphrase, Trustworthy AI
TL;DR: We propose BiScope, leveraging a novel bi-directional cross-entropy calculation method to detect AI-generated texts.
Abstract: Detecting text generated by Large Language Models (LLMs) is a pressing need in
order to identify and prevent misuse of these powerful models in a wide range of
applications, which have highly undesirable consequences such as misinformation
and academic dishonesty. Given a piece of subject text, many existing detection
methods work by measuring the difficulty of LLM predicting the next token in
the text from their prefix. In this paper, we make a critical observation that
how well the current token’s output logits memorizes the closely preceding input
tokens also provides strong evidence. Therefore, we propose a novel bi-directional
calculation method that measures the cross-entropy losses between an output
logits and the ground-truth token (forward) and between the output logits and
the immediately preceding input token (backward). A classifier is trained to
make the final prediction based on the statistics of these losses. We evaluate our
system, named BISCOPE, on texts generated by five latest commercial LLMs
across five heterogeneous datasets, including both natural language and code.
BISCOPE demonstrates superior detection accuracy and robustness compared to six
existing baseline methods, exceeding the state-of-the-art non-commercial methods’
detection accuracy by over 0.30 F1 score, achieving over 0.95 detection F1 score
on average. It also outperforms the best commercial tool GPTZero that is based on
a commercial LLM trained with an enormous volume of data. Code is available at https://github.com/MarkGHX/BiScope.
Primary Area: Other (please use sparingly, only use the keyword field for more details)
Submission Number: 20457
Loading