Versatile latent distribution-preserving tabular data synthesis-based endovascular treatment selection for intracranial aneurysm
Abstract: Proper decision-making for endovascular treatment (EVT) is crucial in reducing complications of intracranial aneurysms (IAs) and improving the quality of patients’ lives. Electronic medical records (EMRs) possess comprehensive information about patients, which provides substantial data support for EVT decisions. Machine learning (ML) models demonstrate considerable performance in handling the relationship between various clinical indicators and treatment decisions based on EMRs. However, they still face challenges regarding the data scarcity and imbalance distribution of EMRs. To address these issues, we propose a latent distribution-preserving structure that can generate authentic EMRs in a category-aware manner. We first map the EMRs into the latent space to obtain the original prototypes and generate the minority prototypes by augmenting within the latent distributions of the minority categories. The original and synthesized minority prototypes are then reconstructed and synthesized to EMRs. The reconstruction process aligns the original and synthesized data space, promoting high authenticity in data-scarce sample generation. Subsequently, the synthesized and original EMRs are merged to obtain sufficient and balanced training data for ML models. Experiment results demonstrate the effectiveness of our proposed structure in enhancing generative performance across various state-of-the-art tabular GAN and boosts the treatment-decision performance by 2.8%–7.2% in AUC, 3.4%–17.2% in F1. The resources will be publicly available.
Loading