Metric Learning for Detection of Large Language Model Generated Texts

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: metric learning, kernel learning, and sparse coding
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: LLM text detection, synthetic text detection, metric learning, same-context triplet training
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: This paper presents a new paradigm of metric-based detection for LLM-generated texts that is able to balance among computational costs, accessibility, and performances
Abstract: More efforts are being put into improving Large Language Models' (LLM) capabilities than into dealing with their implications. Current LLMs are able to generate texts that are seemingly indistinguishable from those written by human experts. While offering great quality of life, such breakthroughs also pose new challenges in education, science, and a multitude of other areas. To add up, current approaches in LLM text detection are either computationally expensive or need accesses to the LLMs' internal computations, both of which hinder their public accessibility. With such motivation, this paper presents a new paradigm of metric-based detection for LLM-generated texts that is able to balance among computational costs, accessibility, and performances. Specifically, the detection is performed through evaluating the similarity between a given text to an equivalent example generated by LLMs and through that determining the former's origination. In terms of architecture, the detection framework includes a text embedding model and a metric model. Currently, the embedding component is a pretrained language model. We focus on designing the metric component which is trained with triplets of same-context instances to signify distances between human responses and LLM ones while reducing that among LLM texts. Additionally, we develop and publish four datasets totaling over 85,000 prompts and triplets of responses in which one from human and two from GPT-3.5 TURBO for benchmarking and uses by the public. Experiment studies show that our best architectures maintain F1 scores in between 0.87 to 0.95 across the tested corpora in both same-corpus and out-of-corpus settings, either with or without paraphrasing.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7753
Loading