Enhancing Data Curation for Clinical Trial Registries: Application of Language Models for Drug and Disease Recognition and Normalization
Clinical trial registry reviews can reveal crucial insights into medical research quality and scope. The current process for generating reports from these registries relies heavily on manual data curation, which includes categorizing trials by disease type and classifying drugs. These tasks are time-consuming and prone to human error. In the present work, we explore the use of automated techniques for extracting drug and disease information, as well as their linking to a medical ontology. By improving the data capture and curation, our aim is to contribute to the development of new systems for reviewing and monitoring clinical trial registries.