Classification of text fragments in available anti-plagiarism tools without access to the source file
Keywords: diploma theses documents, document layout understanding, text classification
Abstract: As part of the development of the Unified Anti-Plagiarism System (JSA), a polish nationwide platform for detecting plagiarism in theses and other academic documents, research was conducted to improve the text extraction process. JSA operates solely on text content extracted from documents, without access to the original source files, preventing multi-modal approaches based on document layout. As a result, a new method was developed, which allows for the identification of fragment types based on character string analysis.
Paper Type: Short
Research Area: NLP Applications
Research Area Keywords: educational applications
Contribution Types: NLP engineering experiment, Approaches to low-resource settings
Languages Studied: polish
Submission Number: 4019
Loading