NeuroTrialNER: An Annotated Corpus for Neurological Diseases and Therapies in Clinical Trial RegistriesDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: Extracting and aggregating information from clinical trial registries could provide invaluable insights into the drug development landscape and advance the treatment of neurologic diseases. However, achieving this at scale is hampered by the volume of available data and the lack of an annotated corpus to assist in the development of automation tools. Thus, we introduce NeuroTrialNER, a new and fully open corpus for named entity recognition (NER). It comprises 893 clinical trial summaries sourced from ClinicalTrials.gov, annotated for neurological diseases, interventions, and control treatments. We describe our data collection process and the corpus in detail. We demonstrate its utility for NER using large language models and achieve a close-to-human performance. By bridging the gap in data resources, we hope to foster the development of text processing applications that help researchers navigate clinical trials data more easily, efficiently, and comprehensively.
Paper Type: long
Research Area: Resources and Evaluation
Contribution Types: Data resources, Data analysis
Languages Studied: English
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview