Geometry-Aware Attenuation Learning for Sparse-View CBCT Reconstruction

1School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China
2Shanghai United Imaging Intelligence Co., Ltd., Shanghai, China
3Shanghai Clinical Research and Trial Center, Shanghai, China
4School of Informatics, The University of Edinburgh, Edinburgh, UK
5Department of Computer Science, The University of Hong Kong, Hong Kong, China


Abstract


CBCT scanning and reconstruction. In the CBCT scanning process, CBCT scanning (a) would generate a sequence of 2D X-ray projections (b). These projections are utilized to reconstruct 3D CBCT image (c).

Cone Beam Computed Tomography (CBCT) plays a vital role in clinical imaging. Traditional methods typically require hundreds of 2D X-ray projections to reconstruct a high-quality 3D CBCT image, leading to considerable radiation exposure. This has led to a growing interest in sparse-view CBCT reconstruction to reduce radiation doses. While recent advances, including deep learning and neural rendering algorithms, have made strides in this area, these methods either produce unsatisfactory results or suffer from time inefficiency of individual optimization. In this paper, we introduce a novel geometry-aware encoder-decoder framework to solve this problem. Our framework starts by encoding multi-view 2D features from various 2D X-ray projections with a 2D CNN encoder. Leveraging the geometry of CBCT scanning, it then back-projects the multi-view 2D features into the 3D space to formulate a comprehensive volumetric feature map, followed by a 3D CNN decoder to recover 3D CBCT image. Importantly, our approach respects the geometric relationship between 3D CBCT image and its 2D X-ray projections during feature back projection stage, and enjoys the prior knowledge learned from the data population. This ensures its adaptability in dealing with extremely sparse view inputs without individual training, such as scenarios with only 5 or 10 X-ray projections. Extensive evaluations on two simulated datasets and one real-world dataset demonstrate exceptional reconstruction quality and time efficiency of our method.

Methodology


Overview of our proposed method. A 2D CNN encoder first extracts feature representations from multi-view X-ray projections. Then, we build a 3D feature map by feature back projection and adaptive feature fusing. Finally, this 3D feature map is fed into a 3D CNN decoder to produce the final CBCT image.

Sparse-view CBCT reconstruction is a highly ill-posed problem with twofold challenges: (1) How to bridge the dimension gap between multi-view 2D X-ray projections and the CBCT image; (2) How to solve information insufficiency introduced by extremely sparse-view input. In this study, we introduce a geometry-aware encoder-decoder framework to solve this task efficiently. It seamlessly integrates the multi-view consistency of neural rendering and the generalization ability of deep learning, effectively addressing the challenges mentioned above. Specifically, we first adopt a 2D convolutional neural network (CNN) encoder to extract multi-view 2D features from different X-ray projections. Then, in aligning with the geometry of CBCT scanning, we back-project multi-view 2D features into 3D space, which properly bridges the dimension gap with multi-view consistency. Particularly, as different views offer varying degrees of information, an adaptive feature fusion strategy is further introduced to aggregate these multi-view features. Consequently, a 3D volumetric feature is constructed and then decoded into 3D CBCT image with a 3D CNN decoder. Our framework's inherent geometry awareness ensures accurate information retrieval from multi-view X-ray projections. Moreover, by capturing prior knowledge from populations in extensive datasets, our method can generalize well across different patients without individual optimization, even with extremely sparse input views, such as 5 or 10 views. We have validated our effectiveness on two simulated datasets (dental and spine) and one real-world dataset (walnut). You may refer to the code link for the details of our datasets.

Results


Qualitative comparison on case #10 from dental dataset (axial slice). Window: [-1000, 2000] HU. For more results, please refer to our paper.

Qualitative comparison with current two SOTA methods SNAF and DIF-Net on case #9 from dental dataset. From top to bottom: axial, coronal, and sagittal slices. Window: [-1000, 2000] HU. For more results, please refer to our paper.

Quantitative comparison on dental dataset. The best performance is shown in bold. For more results, please refer to our paper.

Citation


@ARTICLE{SVCT,
          author={Liu, Zhentao and Fang, Yu and Li, Changjian and Wu, Han and Liu, Yuan and Shen, Dinggang and Cui, Zhiming},
          journal={IEEE Transactions on Medical Imaging}, 
          title={Geometry-Aware Attenuation Learning for Sparse-View CBCT Reconstruction}, 
          year={2024},
          doi={10.1109/TMI.2024.3473970}
      }