Introducing 3D Representation for Dense Volume-to-Volume Translation via Score Fusion

Xiyue Zhu; Dou Hoon Kwark; Ruike Zhu; Kaiwen Hong; Yiqi Tao; Shirui Luo; Yudu Li; Zhi-Pei Liang; Volodymyr Kindratenko

Introducing 3D Representation for Dense Volume-to-Volume Translation via Score Fusion

Xiyue Zhu, Dou Hoon Kwark, Ruike Zhu, Kaiwen Hong, Yiqi Tao, Shirui Luo, Yudu Li, Zhi-Pei Liang, Volodymyr Kindratenko

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0

TL;DR: We find it hard to train 3D volume translation models in inverse problems that require high accuracy, such as super-resolution. We effectively introduced 3D representation by ensmbling 2D models' results with a 3D model.

Abstract: In volume-to-volume translations in medical images, existing models often struggle to capture the inherent volumetric distribution using 3D voxel-space representations, due to high computational dataset demands. We present Score-Fusion, a novel volumetric translation model that effectively learns 3D representations by ensembling perpendicularly trained 2D diffusion models in score function space. By carefully initializing our model to start with an average of 2D models as in existing models, we reduce 3D training to a fine-tuning process, mitigating computational and data demands. Furthermore, we explicitly design the 3D model's hierarchical layers to learn ensembles of 2D features, further enhancing efficiency and performance. Moreover, Score-Fusion naturally extends to multi-modality settings by fusing diffusion models conditioned on different inputs for flexible, accurate integration. We demonstrate that 3D representation is essential for better performance in downstream recognition tasks, such as tumor segmentation, where most segmentation models are based on 3D representation. Extensive experiments demonstrate that Score-Fusion achieves superior accuracy and volumetric fidelity in 3D medical image super-resolution and modality translation. Additionally, we extend Score-Fusion to video super-resolution by integrating 2D diffusion models on time-space slices with a spatial-temporal video diffusion backbone, highlighting its potential for general-purpose volume translation and providing broader insight into learning-based approaches for score function fusion.

Lay Summary: Medical images like MRIs and CT scans are 3D, but most AI tools struggle to fully understand this 3D structure without using massive computing power and large datasets. Traditional methods often simplify the problem by analyzing 2D slices, missing important information that’s only visible in 3D. We introduce Score-Fusion, a new AI method that combines the strengths of multiple 2D image models to build an accurate 3D understanding of medical data. Instead of training a full 3D model from scratch, which is expensive and time-consuming, we fine-tune a model that cleverly merges insights from several 2D perspectives. This makes the process much more efficient. Score-Fusion can even handle multiple types of medical scans at once to give a clearer, more complete picture. Our model improves tasks like tumor detection and medical image enhancement, which rely on understanding 3D details. It even works for video tasks by treating time as another dimension. By learning how to fuse different views smartly, Score-Fusion opens up a more flexible tool for medical imaging and beyond.

Link To Code: https://score-fusion.github.io/

Primary Area: Deep Learning->Generative Models and Autoencoders

Keywords: Diffusion models, 3D medical image generation, video generation

Submission Number: 13813

Loading