How to measure the consistency of the tagging of scientific papers?

Anonymous

How to measure the consistency of the tagging of scientific papers?

Anonymous

17 Nov 2018 (modified: 05 May 2023)AKBC 2019 Conference Withdrawn SubmissionReaders: Everyone

Keywords: keyword extraction, citation graphs

TL;DR: A good tagger gives similar tags to a given paper and the papers it cites

Abstract: A collection of scientific papers is often accompanied by tags: keywords, topics, concepts etc., associated with each paper. Sometimes these tags are human-generated, sometimes they are machine-generated. We propose a simple measure of the consistency of the tagging of scientific papers: whether these tags are predictive for the citation graph links. Since the authors tend to cite papers about the topics close to those of their publications, a consistent tagging system could predict citations. We present an algorithm to calculate consistency, and experiments with human- and machine-generated tags. We show that augmentation, i.e. the combination of the manual tags with the machine-generated ones, can enhance the consistency of the tags. We further introduce cross-consistency, the ability to predict citation links between papers tagged by different taggers, e.g. manually and by a machine. Cross-consistency can be used to evaluate the tagging quality when the amount of labeled data is limited.

Archival Status: Archival

Subject Areas: Information Extraction, Knowledge Representation, Applications: Science

4 Replies

Loading