JET-Diff: Joint-Encoding Tensor Diffusion Model for Accurate DTI Reconstruction from Sparse DWIs

Junhyeok Lee; Kyu Sung Choi

JET-Diff: Joint-Encoding Tensor Diffusion Model for Accurate DTI Reconstruction from Sparse DWIs

Junhyeok Lee, Kyu Sung Choi

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Diffusion Tensor Imaging, Deep Learning, Generative Models, Latent Diffusion Models, Image Reconstruction, Neuroimaging

TL;DR: To overcome the long scan times of conventional DTI, we propose JET-Diff, a latent diffusion model that generates accurate and physically plausible 3D diffusion tensors from a minimal number of DWI scans.

Abstract: Diffusion Tensor Imaging (DTI) is an advanced magnetic resonance imaging (MRI) technique for characterizing white matter microstructure. Conventional DTI protocols require multiple diffusion-weighted imaging (DWI) acquisitions across numerous directions, resulting in long scan times, motion artifacts, patient discomfort, and reduced clinical utility. Current deep learning approaches frequently yield diffusion tensors that are anatomically inconsistent or physically implausible. We introduce Joint-Encoding Tensor Diffusion (JET-Diff), a framework that synthesizes the full six-component diffusion tensor in 3D from a highly undersampled DWI acquisition. Specifically, we propose a latent diffusion model operating on a set of coupled latent tensors derived from sparse DWIs and diffusion tensor components, which improves anatomical fidelity and encourages physically consistent tensors. JET-Diff leverages a novel anatomical autoencoder to disentangle structural information from tensor properties, yielding a compact and expressive latent space optimized for generative performance. Experiments on the Human Connectome Project (HCP) Young Adult dataset demonstrate that JET-Diff improves reconstruction accuracy and produces geometrically consistent diffusion tensors, as evidenced by SPD-aware validity metrics such as Log-Euclidean and tractography-based distances.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 16733

Loading