Unpaired-to-paired data synthesis: Learning to model disease effects via contrastive analysis of neuroimaging-derived features

Sai Spandana Chintapalli; Christos Davatzikos

Unpaired-to-paired data synthesis: Learning to model disease effects via contrastive analysis of neuroimaging-derived features

Sai Spandana Chintapalli, Christos Davatzikos

19 Sept 2025 (modified: 01 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: contrastive analysis, variational inference, synthetic data generation, radiomics, neuroimaging

TL;DR: I use a contrastive analysis strategy for synthetic paired data generation of neuroimaging derived features

Abstract: Advances in machine learning have enabled the analysis of complex, high-dimensional datasets, yet neuroimaging lags behind due to data privacy and sharing constraints. Synthetic data offers a promising solution for developing and training models. However, synthesizing disease-specific datasets is challenging, as neurological disorders induce progressive changes in the brain that are subtle and often obscured by normal brain variability. Contrastive analysis provides a framework to learn generative factors that deconvolve variation shared between background (e.g., healthy) and target (e.g., diseased) datasets from variation unique to the target, making it particularly effective for capturing as well as modeling subtle disease effects. In this paper, we reformulate this framework to synthesize tabular neuroimaging-derived features, specifically brain regional volumes from T1-weighted structural MRI. Given unpaired neuroimaging samples of healthy and diseased participants, we learn to generate paired healthy and disease feature representations that emulate real disease effects. We show that paired synthesis enables fine-grained, individual-level modeling of disease effects, improving downstream analyses, and supporting more precise exploration of disease heterogeneity. We validate the models on both semi-synthetic and real-world brain regional volume datasets, specifically designed to highlight the heterogeneity parsing capability of contrastive analysis. The models are available at: [link].

Supplementary Material: zip

Primary Area: applications to neuroscience & cognitive science

Submission Number: 20827

Loading