StructGen: Leveraging Structured EHR Prompts and Biomedical BERTs for Chest X-ray Synthesis

SUCHIT PATEL; Karandeep Singh Sodhi; Manik Gupta; Mei-Tai Chu

StructGen: Leveraging Structured EHR Prompts and Biomedical BERTs for Chest X-ray Synthesis

SUCHIT PATEL, Karandeep Singh Sodhi, Manik Gupta, Mei-Tai Chu

Published: 19 Aug 2025, Last Modified: 12 Oct 2025BHI 2025EveryoneRevisionsBibTeXCC BY 4.0

Confirmation: I have read and agree with the IEEE BHI 2025 conference submission's policy on behalf of myself and my co-authors.

Keywords: RoentGen, medical image synthesis, CXR generation, clinical text conditioning, biomedical language models, Prompt engineering.

TL;DR: This study evaluates prompt strategies and biomedical BERT models for text-conditioned chest X-ray generation, offering clinically relevant insights to improve synthetic data quality and support diagnostic applications in healthcare.

Abstract: The generation of synthetic chest X-rays from textual clinical data has shown significant promise for augmenting medical datasets and supporting downstream diagnostic tasks. This study extends the RoentGen framework, a latent diffusion-based image generator, by systematically evaluating the influence of structured Electronic Health Record (EHR) derived prompt types and domain-specific Bidirectional Encoder Representations from Transformers (BERT)-based language models on image quality and semantic fidelity. We propose four prompt strategies derived from structured EHR fields: Detailed, Disease, Demographic, and Device, and examine their effect on synthesized image realism and alignment. Additionally, we compared ten biomedical encoders, including ClinicalBERT, BioBERT, PubMedBERT, and others, across multiple visual-semantic metrics such as Structural Similarity Index Measure (SSIM), Peak Signal-to-Noise Ratio (PSNR), Learned Perceptual Image Patch Similarity (LPIPS), Contrastive Language Image Pretraining Score (CLIPScore), and Fr´echet Inception Distance with X-Radiology Vision features (FID-XRV). Our findings highlight that both the content of the prompt and the choice of encoder substantially impact the quality and interpretability of generated images. Notably, BioBERT paired with disease-centric prompts consistently yields superior results. This work provides valuable insights for improving conditional medical image generation, particularly in settings with limited narrative text.

Track: 3. Imaging Informatics

Registration Id: DMNJJR6V6CC

Submission Number: 63

Loading