The generation of medical images presents significant challenges due to their high-resolution and three-dimensional nature. Existing methods often yield suboptimal performance in generating high-quality 3D medical images, and there is currently no universal generative framework for medical imaging. In this paper, we introduce the 3D Medical Diffusion (3D MedDiffusion) model for controllable, high-quality 3D medical image generation. 3D MedDiffusion incorporates a novel, highly efficient Patch-Volume Autoencoder that compresses medical images into latent space through patch-wise encoding and recovers back into image space through volume-wise decoding. Additionally, we design a new noise estimator to capture both local details and global structure information during diffusion denoising process. 3D MedDiffusion can generate fine-detailed, high-resolution images (up to 512x512x512) and effectively adapt to various downstream tasks as it is trained on large-scale datasets covering CT and MRI modalities and different anatomical regions (from head to leg). Experimental results demonstrate that 3D MedDiffusion surpasses state-of-the-art methods in generative quality and exhibits strong generalizability across tasks such as sparse-view CT reconstruction, fast MRI reconstruction, and data augmentation.
Our method introduces the Patch-Volume Autoencoder, which compresses images into a latent space in a patch-wise manner and decodes them volume-wise. In the latent space, we perform diffusion and denoising processes. The proposed noise estimator, BiFlowNet, denoises the latent representation through two branches: the intra-patch flow, which independently restores each latent patch, and the inter-patch flow, which reconstructs the entire latent volume cohesively.
The pre-trained 3D MedDiffusion can be seamlessly adapted to various downstream tasks by integrating it with ControlNet. By keeping the 3D MedDiffusion parameters fixed and applying high-efficient fine-tuning to ControlNet, the general-purpose generative 3D MedDiffusion is transformed into a task-specific 3D MedDiffusion.
BibTex Code Here