Abstract: In medical research, clinical trials are pivotal. While prospective clinical research provides a systematic approach to collecting patient data, it grapples with challenges like long durations, increased costs, and most crucially, data scarcity. To address above-mentioned challenge, this paper introduces a novel approach: using cross-table generation to create relevant data. Unlike existing work focused on single-table operations, our method leverages data from multiple sources across various tables, integrating diverse data types and ensuring data consistency across multiple tables. We develop a new framework, MedTransTab, tailored for cross-table tabular data generation in the medical context. This framework extends our previous efforts and is built upon the newly constructed PMC-Struct, derived from an unstructured PMC-patient dataset. Our MedTransTab can generate high-quality patient records, synthesizing detailed biomedical information to align with real or simulated tables from multiple sources. The experiments show that the proposed method significantly improves performance in cross-table tasks. On the PMC-Struct-Plus dataset, we observe an average improvement of 28.85% in data generation and prediction. Similarly, on the Out-Of-Domain (OOD) dataset, there's an average improvement of 22.56%, indicating substantial progress in medical data analysis.
Loading