Abstract: We present a holistic investigation of the detection of LLM-generated academic writing by providing the dataset, user study, and algorithms, aiming to inspire more community effort to address the concern of LLM academic misuse. We first introduce GPABench2, a benchmarking dataset of 2.385 million samples of human-written, GPT-written, GPT-completed, and GPT-polished abstracts of research papers in subjects of computer science, physics, and humanities & social sciences. Through a user study of 155 participants, we show the complication for human users, including experienced faculty and researchers, to identify GPT-generated abstracts. Last, we present CheckGPT, a LLM-content detector consisting of a general representation module and an attentive-BiLSTM classification module, which is highly accurate and transferable.
Paper Type: long
Research Area: NLP Applications
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: English
Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.
0 Replies
Loading