Predicting the presence of inline citations in academic text using binary classificationDownload PDF

Published: 20 Mar 2023, Last Modified: 18 Apr 2023NoDaLiDa 2023Readers: Everyone
Keywords: Inline Citations, SciBERT, Text classification, Academic text
TL;DR: We try predicting whether or not inline citations should appear in academic text using a dataset of pre-processed academic articles and SciBERT.
Abstract: Properly citing sources is a crucial component of any good-quality academic paper. The goal of this study was to determine what kind of accuracy we could reach in predicting whether or not a sentence should contain an inline citation using a simple binary classification model. To that end, we fine-tuned SciBERT on both an imbalanced and a balanced dataset containing sentences with and without inline citations. We achieved an overall accuracy of over 0.92, suggesting that language patterns alone could be used to predict where inline citations should appear with some degree of accuracy.
3 Replies

Loading