Text classification can distinguish mainstream and fringe scientific papers

Anonymous

19 Jun 2019 (modified: 28 Jun 2019)OpenReview Anonymous Preprint Blind SubmissionReaders: Everyone

Abstract: In this work, I explore the use of supervised learning in distinguishing mainstream and fringe scientific papers. This work has two goals. The first is to determine whether mainstream and fringe scientific papers can be reliably distinguished through automated means. The second is to determine whether classifiers trained using stylometric features, such as word count, average word and sentences lengths, and frequencies of part-of-speech sequences, can outperform conventional n-gram document models in classifying papers across scientific topics. I conduct a systematic study of the ability of classifiers to distinguish mainstream and fringe scientific papers across topics, for example by training a classifier on biophysics papers and testing it against cosmology papers. The term-based and style-based approaches both perform significantly better than chance, with neither approach consistently outperforming the other. Classifiers trained using the combined feature set (i.e., n-gram frequencies and stylometric features) perform little better than those trained only on one or the other feature set, suggesting that the two feature sets are, in aggregate, highly correlated. Overall, the results of this work suggest that mainstream and fringe scientific papers are readily distinguishable by conventional text classification methods.

0 Replies