Mapping Study Variables to Common Data Elements Using GPT for Sheets: Towards Standardized Data Collection and Sharing
Abstract: Secondary use or reuse of biomedical research data has drawn significant attention and is of growing importance. Non-standardized representation and wide variability of clinical and observational data of biomedical research pose a major challenge to data collection, integration, analysis, and sharing. To address this requirement, we present experiences of using GPT for Sheets as a tool to map study-level biomedical research variables or data elements to Common Data Elements (CDEs). In this study, the trial variables were extracted from the data dictionaries in the RADx-rad program, and the NIH Pediatric COVID-19 Working Group recommended CDEs that include children (0 – 21 years) were used as the reference CDEs. Mapping experimental variables (vague data elements) to CDEs resulted in an Fl-score of 0.82 when considering only the first recommendation and 0.85 when considering either the first or second recommendation. Our results indicate that GPT language models can differentiate freely defined study variables and map them to standard CDEs. This, in turn, potentially further enhances interoperability with a wider range of biomedical studies.
Loading