- Keywords: keyword extraction, citation graphs
- TL;DR: A good tagger gives similar tags to a given paper and the papers it cites
- Abstract: A collection of scientific papers is often accompanied by tags: keywords, topics, concepts etc., associated with each paper. Sometimes these tags are human-generated, sometimes they are machine-generated. We propose a simple measure of the consistency of the tagging of scientific papers: whether these tags are predictive for the citation graph links. Since the authors tend to cite papers about the topics close to those of their publications, a consistent tagging system could predict citations. We present an algorithm to calculate consistency, and experiments with human- and machine-generated tags. We show that augmentation, i.e. the combination of the manual tags with the machine-generated ones, can enhance the consistency of the tags. We further introduce cross-consistency, the ability to predict citation links between papers tagged by different taggers, e.g. manually and by a machine. Cross-consistency can be used to evaluate the tagging quality when the amount of labeled data is limited.
- Archival status: Archival
- Subject areas: Information Extraction, Knowledge Representation, Applications: Science