HealthDBFinder: a question-answering task for health database discovery

Published: 01 Jan 2024, Last Modified: 16 May 2025CBMS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Integrating advanced data processing technologies into healthcare has shifted the medical studies paradigm. These evolve from data collection into management and analysis of Electronic Health Records (EHR) data. This change improved patient care and expanded the scope of clinical research through the secondary usage of existing data. Even though this problem was already solved in other initiatives, it raised new challenges, namely regarding cohort definition, data discovery, and evaluating the study feasibility. There are database catalogues to help in those tasks, but these fail in some cases due to insufficient information. Therefore, in this paper, we address this challenge by proposing a baseline method for information retrieval, including a synthetic dataset for further research. The information present in the dataset was generated from metadata extracted from real-world databases, which represents real problems that do not yet have a solution. The source code of this work is available at http://github.com/bioinformatics-ua/HealthDBFinder.
Loading