Keywords: keyword extraction, citation graphs
TL;DR: A good tagger gives similar tags to a given paper and the papers it cites
Abstract: A collection of scientific papers is often accompanied by tags:
keywords, topics, concepts etc., associated with each paper.
Sometimes these tags are human-generated, sometimes they are
machine-generated. We propose a simple measure of the consistency
of the tagging of scientific papers: whether these tags are
predictive for the citation graph links. Since the authors tend to
cite papers about the topics close to those of their publications, a
consistent tagging system could predict citations. We present an
algorithm to calculate consistency, and experiments with human- and
machine-generated tags. We show that augmentation, i.e. the combination
of the manual tags with the machine-generated ones, can enhance the
consistency of the tags. We further introduce cross-consistency,
the ability to predict citation links between papers tagged by
different taggers, e.g. manually and by a machine.
Cross-consistency can be used to evaluate the tagging quality when
the amount of labeled data is limited.
Archival Status: Archival
Subject Areas: Information Extraction, Knowledge Representation, Applications: Science
4 Replies
Loading