Abstract: A standard measure of the influence of a research paper is the number of times it is cited.
However, papers may be cited for many reasons, and citation count offers limited information about the extent to which a paper affected the content of subsequent publications.
We therefore propose a novel method to quantify linguistic influence in timestamped document collections. There are two main steps:
first, identify lexical and semantic changes using contextual embeddings and word frequencies; second, aggregate information about these
changes into per-document influence scores
by estimating a high-dimensional Hawkes process with a low-rank parameter matrix. We
show that this measure of linguistic influence
is predictive of future citations: the estimate
of linguistic influence from the two years after a paper’s publication is correlated with and
predictive of its citation count in the following three years. This is demonstrated using
an online evaluation with incremental temporal
training/test splits, in comparison with a strong
baseline that includes predictors for initial citation counts, topics, and lexical features.
0 Replies
Loading