Semi-Synthetic Localization Datasets for Radiological Findings on Chest X-Rays

02 Dec 2025 (modified: 15 Dec 2025)MIDL 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Semi-synthetic CXRs, Inpainting, Diffusion models
Abstract: While large datasets for chest X-ray (CXR) finding classification are widely available, datasets for finding localization are scarce. Curating these localization datasets is costly and time-intensive, requiring manual annotation by medical experts, which often results in them being small and limited in scope. To overcome this, we introduce SemiSynCXR, a framework designed to automatically generate semi-synthetic localization datasets. SemiSynCXR operates by inpainting specific radiological findings into real healthy CXRs at anatomically plausible locations, which allows to output both the edited image and the ground-truth bounding box for each finding. SemiSynCXR-generated CXRs effectively augment existing localization datasets, mitigating data scarcity and improving generalization. Comprehensive quantitative and qualitative evaluations confirm that the generated findings are realistic and accurately localized, establishing SemiSynCXR as a practical solution for the generation of CXR finding localization datasets. Code will be released upon acceptance.
Primary Subject Area: Image Synthesis
Secondary Subject Area: Application: Radiology
Registration Requirement: Yes
Visa & Travel: Yes
Read CFP & Author Instructions: Yes
Originality Policy: Yes
Single-blind & Not Under Review Elsewhere: Yes
LLM Policy: Yes
Submission Number: 244
Loading