Towards Machine-Assisted Biomedical Data Preparation: A Use Case on Disparity in Access to Health CareDownload PDF

26 Sept 2023OpenReview Archive Direct UploadReaders: Everyone
Abstract: Data preparation is a time-consuming task required for data analytics. In the biomedical field, we observe that datasets tend to have a large number of diversified variables, especially when we consider data coming from healthcare facilities. When data analytics depends on variables from several studies, one approach is to use semantics to annotate and support the alignment and combination of variables. We propose a novel use of semantics to support biomedical data preparation, specifically the use of semantic variable normalization in support of machine-assisted biomedical data preparation. To illustrate our approach, we present a use case in disparity in access to health care using data from the U.S. National Health and Nutrition Examination Surveys (NHANES), one of the most studied biomedical datasets in the U.S. This use case is a multi-cycle study of disparities in access to needed care that requires the semantic combination of data from three survey cycles. We demonstrate that NHANES data can be normalized and accessed regardless of cycle by the use of a semantic representation of study variables and a semantically-enabled faceted search. This approach can reduce the time required for data understanding and preparation, especially in settings like NHANES where it is common to combine data from several cycles.
0 Replies

Loading