Abstract: The emergence of large language models (LLMs), such as Generative Pre-trained Transformer 4 (GPT-4) used by ChatGPT, has profoundly impacted the academic and broader community. While these models offer numerous advantages in revolutionizing work and study methods, they have also garnered significant attention due to their potential negative consequences. One example is generating academic reports or papers without or with a limited human contribution. Consequently, researchers have focused on developing detectors to address the misuse of LLMs. However, most existing works prioritize achieving higher accuracy on restricted datasets, neglecting the crucial aspect of generalizability. This limitation hinders their practical application in real-life scenarios where reliability is paramount. In this paper, we present a comprehensive analysis of the influence of prompts on the text generated by LLMs and highlight the potential lack of robustness in one of the current state-of-the-art GPT detectors. To mitigate these issues concerning the misuse of LLMs in academic writing, we propose a reference-based Siamese detector taking a pair of texts: one as the inquiry and the other as the reference. Our method effectively addresses the lack of robustness and significantly improves the baseline performances in challenging scenarios, increasing them by approximately 25% to 67%.
Paper Type: long
Research Area: NLP Applications
0 Replies
Loading