Open-source Pipeline for Automated Detection of Unrelated Citations

ACL ARR 2025 May Submission6784 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Citations are important for ensuring the integrity of scientific literature. However, automated citation verification remains a challenge due to a lack of dedicated datasets and limited research focus. In this paper, we introduce an open-source and automated pipeline that integrates citation retrieval and unrelated citation detection. We have built an annotated dataset to ensure the reliability of our pipeline, which can also be used by others to enhance citation verification tasks. We have also validated the pipeline’s applicability to real situations, successfully identifying unrelated citations in real scientific papers. Our work is useful as it assists research integrity scientists to identify potential scientific fraud in a more efficient way.
Paper Type: Short
Research Area: Resources and Evaluation
Research Area Keywords: automatic creation and evaluation of language resources,NLP datasets,benchmarking
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources
Languages Studied: English
Submission Number: 6784
Loading