Extracting and matching authors and affiliations in scholarly documents

Muthukumar Chandrasekaran

29 Oct 2021OpenReview Archive Direct UploadReaders: Everyone

Abstract: We introduce Enlil, an information extraction system that discovers the institutional affiliations of authors in scholarly papers. Enlil consists of two steps: one that first identifies authors and affiliations using a conditional random field; and a second support vector machine that connects authors to their affiliations. We benchmark Enlil in three separate experiments drawn from three different sources: the ACL Anthology Corpus, the ACM Digital Library, and a set of cross-disciplinary scientific journal articles acquired by querying Google Scholar. Against a state-of-the-art production baseline, Enlil reports a statistically significant improvement in F_1 of nearly 10%(p<< 0.01). In the case of multidisciplinary articles from Google Scholar, Enlil is benchmarked over both clean input (F_1> 90%) and automatically-acquired input (F_1> 80%).

0 Replies