Track: Long Paper Track (up to 9 pages)
Keywords: AI Safety, Plagiarism Detection, Hash Encoding, Document Similarity
TL;DR: With the rapid advancement of large language models (LLMs) and generative AI technology, how can we determine whether an article on the internet was written by a real person or generated by a LLMs-based AI?
Abstract: With the rapid advancement of large language models (LLMs) and generative AI technology, a challenging issue has emerged: How can we determine whether an article on the internet was written by a real person or generated by a LLMs-based AI? As the barriers to training and inference of LLMs continue to lower, a vast number of AI-generated articles could enable an inexperienced person to cheat as an expert in a particular field. Traditional text plagiarism detection techniques can address this issue to some extent, but all of them have their own limitations. We provide a systematic review of existing text plagiarism detection methods and propose a new benchmark to evaluate the accuracy of various text detection techniques across different scenarios.
Submission Number: 53
Loading