Enhanced Protein-Protein Interactions Extraction from the Literature using Entity Type- and Position-aware RepresentationDownload PDF

Anonymous

16 Jan 2022 (modified: 05 May 2023)ACL ARR 2022 January Blind SubmissionReaders: Everyone
Abstract: Since protein-protein interactions (PPIs) are crucial to understanding living systems, harvesting these data is essential to probe the development of diseases and to understand gene/protein functions and biological processes. Some curated datasets exist containing PPI data derived from the literature and other sources (e.g., IntAct, BioGrid, DIP and HPRD), but these are far from exhaustive and their maintenance is a labor intensive process.On the other hand, machine learning (ML) methods to automate PPI knowledge extraction from the scientific literature have been limited by a shortage of appropriate annotated data.In this work, we create a unified multi-source PPI corpora with vetted interaction definitions, and augmented by binary interaction type labels.We also present a Transformer-based deep learning method, exploiting entity type and positional information for relation representation to improve relation classification performance.We evaluated our model's performance on three widely studied relation extraction datasets from biology and computer science domains as well as our work's target PPI datasets to observe the effectiveness of the representation to relation extraction tasks in various domains, and found it to outperform prior state-of-the-art (SOTA) models.
Paper Type: long
0 Replies

Loading