On Continually Tracing Origins of LLM-Generated Text and Its Application in Detecting Cheating in Student Coursework

Published: 2025, Last Modified: 16 Jan 2026Big Data Cogn. Comput. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Large language models (LLMs) have demonstrated remarkable capabilities in text generation, which also raise numerous concerns about their potential misuse, especially in educational exercises and academic writing. Accurately identifying and tracing the origins of LLM-generated content is crucial for accountability and transparency, ensuring the responsible use of LLMs in educational and academic environments. Previous methods utilize binary classifiers to discriminate whether a piece of text was written by a human or generated by a specific LLM or employ multi-class classifiers to trace the source LLM from a fixed set. These methods, however, are restricted to one or several pre-specified LLMs and cannot generalize to new LLMs, which are continually emerging. This study formulates source LLM tracing in a class-incremental learning (CIL) fashion, where new LLMs continually emerge, and a model incrementally learns to identify new LLMs without forgetting old ones. A training-free continual learning method is further devised for the task, the idea of which is to continually extract prototypes for emerging LLMs, using a frozen encoder, and then to perform origin tracing via prototype matching after a delicate decorrelation process. For evaluation, two datasets are constructed, one in English and one in Chinese. These datasets simulate a scenario where six LLMs emerge over time and are used to generate student essays, and an LLM detector has to incrementally expand its recognition scope as new LLMs appear. Experimental results show that the proposed method achieves an average accuracy of 97.04% on the English dataset and 91.23% on the Chinese dataset. These results validate the feasibility of continual origin tracing of LLM-generated text and verify its effectiveness in detecting cheating in student coursework.
Loading