Conditional Adversarial Random Forest for Synthetic Electronic Health Record Generation

Published: 22 Sept 2025, Last Modified: 22 Sept 2025WiML @ NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Machine Learning, Synthetic Data generation, healthcare, Adversial Conditional Random Forests
Abstract: Synthetic Electronic Health Records (EHRs) enable privacy-preserving healthcare data sharing for machine learning research. However, existing methods struggle with: maintaining temporal consistency across patient visits while preserving demographic-clinical correlations. Current approaches either sacrifice temporal fidelity or require extensive postprocessing. We propose Conditional Adversarial Random Forest (CARF), extending Adversarial Random Forest [1] with a two-model strategy. The first model generates patient-level demographics that remain static across visits. The second conditional model produces visit-level clinical variables, incorporating visit rank and time progression to create complete patient trajectories. This eliminates manual postprocessing while preserving temporal patterns inherently.
Submission Number: 207
Loading