Human-in-the-loop Detection of AI-generated Text via Grammatical Patterns

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Supplementary Material: pdf
Primary Area: societal considerations including fairness, safety, privacy
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: detection, language models
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: The increasing proliferation of large language models (LLMs) has raised significant concerns about the detection of AI-written text. Ideally, the detection method should be accurate (in particular, it should not falsely accuse humans of using AI-generated text), and interpretable (it should provide a decision as to why the text was detected as either human or AI-generated). Existing methods tend to fall short of one or both of these requirements, and recent work has even shown that detection is impossible in the full generality. In this work, we focus on the problem of detecting AI-generated text in a domain where a training dataset of human-written samples is readily available. Our key insight is to learn interpretable grammatical patterns that are highly indicative of human or AI written text. The most useful of these patterns can then be given to humans as part of a human-in-the-loop approach. In our experimental evaluation, we show that the approach can effectively detect AI-written text in a variety of domains and generalize to different language models. Our results in a human trial show an improvement in the detection accuracy from $43$% to $86$%, demonstrating the effectiveness of the human-in-the-loop approach. We also show that the method is robust to different ways of prompting LLM to generate human-like patterns. Overall, our study demonstrates that AI text can be accurately and interpretably detected using a human-in-the-loop approach.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7885
Loading