Classification of text fragments in available anti-plagiarism tools without access to the source file

Classification of text fragments in available anti-plagiarism tools without access to the source file

ACL ARR 2026 January Submission4019 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: diploma theses documents, document layout understanding, text classification

Abstract: As part of the development of the Unified Anti-Plagiarism System (JSA), a polish nationwide platform for detecting plagiarism in theses and other academic documents, research was conducted to improve the text extraction process. JSA operates solely on text content extracted from documents, without access to the original source files, preventing multi-modal approaches based on document layout. As a result, a new method was developed, which allows for the identification of fragment types based on character string analysis.

Paper Type: Short

Research Area: NLP Applications

Research Area Keywords: educational applications

Contribution Types: NLP engineering experiment, Approaches to low-resource settings

Languages Studied: polish

Submission Number: 4019

Loading