Semi-Synthetic Localization Datasets for Radiological Findings on Chest X-Rays

Published: 14 Feb 2026, Last Modified: 15 Apr 2026MIDL 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Semi-synthetic CXRs, Inpainting, Diffusion models
Abstract: While large datasets for chest X-ray (CXR) finding classification are widely available, datasets for finding localization are scarce. Curating these localization datasets is costly and time-intensive, requiring manual annotation by medical experts, which often results in them being small and limited in scope. To overcome this, we introduce *SemiSynCXR*, a framework designed to automatically generate semi-synthetic localization datasets. *SemiSynCXR* operates by inpainting specific radiological findings into real, healthy CXRs at anatomically plausible locations, which allows for the output of both the edited image and the ground-truth bounding box for each finding. *SemiSynCXR*-generated CXRs effectively augment existing localization datasets, yielding relative $mAP_{10:70}$ gains of up to 11\% on in-domain and 21\% on out-of-domain data, thereby mitigating data scarcity and improving generalization. Comprehensive quantitative and qualitative evaluations show that our framework achieves an overall AUROC of 0.78 and $mAP_{10:70}$ of 0.45, comparable to fully synthetic benchmarks. These results confirm that the generated findings are realistic and accurately localized, establishing *SemiSynCXR* as a practical solution for the generation of CXR finding localization datasets. Code available at the [SemiSynCXR GitHub Repository](https://github.com/anpoc/SemiSynCXR).
Primary Subject Area: Image Synthesis
Secondary Subject Area: Application: Radiology
Registration Requirement: Yes
Visa & Travel: Yes
Read CFP & Author Instructions: Yes
Originality Policy: Yes
Single-blind & Not Under Review Elsewhere: Yes
LLM Policy: Yes
Midl Latex Submission Checklist: Ensure no LaTeX errors during compilation., Replace NNN with your OpenReview submission ID., Includes \documentclass{midl}, \jmlryear{2026}, \jmlrworkshop, \jmlrvolume, \editors, and correct \bibliography command., Did not override options of the hyperref package., Did not use the times package., Use the correct spelling and format, avoid Unicode characters, and use LaTeX equivalents instead., Any math in the title and abstract must be enclosed within $...$., Did not override the bibliography style defined in midl.cls and did not use \begin{thebibliography} directly to insert references., Avoid using \scalebox; use \resizebox when needed., Included all necessary figures and removed *unused* files in the zip archive., Removed special formatting, visual annotations, and highlights used during rebuttal., All special characters in the paper and .bib file use LaTeX commands (e.g., \'e for é)., No separate supplementary PDF uploads., Acknowledgements, references, and appendix must start after the main content.
Latex Code: zip
Copyright Form: pdf
Submission Number: 244
Loading