A Benchmark of Discovering Drug-Target Interaction from Biomedical LiteratureDownload PDF

08 Jun 2021 (modified: 24 May 2023)Submitted to NeurIPS 2021 Datasets and Benchmarks Track (Round 1)Readers: Everyone
Abstract: As millions of papers come out every year in the biomedical domain, automatic knowledge discovery (KD) from biomedical literature becomes an urgent demand in the industry. While KD in the biomedical domain attracts much research attention in recent years, the lack of benchmark datasets significantly hinders its progress. In this work, we create a dataset, KD-DTI, for discovering <drug, target, interaction> triplets from literature, which is one of the most important KD tasks in the biomedical domain. KD-DTI contains 14k unique biomedical papers, each of which is associated with at least one drug, target, interaction triplet. We also provide a semi-supervised dataset with 139k unique papers. We present and analyze multiple solutions, including several extractive/generative models and two data enhancement methods. The results show that the performance of those models is far from industry demand, indicating that the dataset presents a challenging research problem for the community. The dataset will be freely accessible after the review process.
Supplementary Material: zip
7 Replies
