16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: Accurate classification of citation intents in a scientific article provides deeper contextual understanding of and better quantifies the contributions of cited articles. This improves scientific literature platform capabilities such as search relevance, ranking and more. To our knowledge, we present the most comprehensive survey of Transformer-based language models performance on the citation intent classification task using SciCite dataset. Here, we make three recommendations. Firstly, we propose to report model performance as a distribution in contrast to a single averaged performance value. This arises from our observation that model performance is sensitive to the random seed choice resulting in wide performance variations from multiple finetuning runs. Secondly, this provides practical insights for model selection, showing the model's best possible performance. Thus, we propose that practitioners perform multiple finetuning runs before selecting the best performing model. Thirdly, we propose a simple data augmentation to improve the distribution of model performance overall. Moving forward, we suggest exploring improvements to the finetuning and model selection process as promising future directions.
