Exploring metadata matching for reference linkingDownload PDF

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone
TL;DR: We considered the task of reference linking -- identifying the paper being cited in a given reference -- and compared using lexical versus semantic similarity of metadata based on accuracy and inference speed.
Abstract: Reference linking, or the identification of the paper in a database that is cited by a given reference, is an important part of academic publishing. In this work, we explored reference linking based on the lexical and semantic similarity in the metadata of references and candidate papers. Our experiments affirmed the strong accuracy of Jaccard similarity reported by prior work (lowest percentage error of 0.95%) but also highlighted its poor inference speed (0.88--1.89 s per query reference, depending on the amount of metadata used). In contrast, semantic similarity-based linking achieves about twice the error rate (1.90%) while being 94 times faster (0.02 s per query reference). We recommend that future reference linking efforts employ a mixed approach of first using the coarser but faster semantic similarity-based linking, and then, only if no candidate achieves a high semantic similarity score, resorting to the slower but more accurate Jaccard-based lexical linking.
Paper Type: short
Research Area: NLP Applications
Contribution Types: NLP engineering experiment, Reproduction study
Languages Studied: English
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview