Cone Beam Computed Tomography (CBCT) plays a vital role in clinical imaging. Traditional methods typically require hundreds of 2D X-ray projections to reconstruct a high-quality 3D CBCT image, leading to considerable radiation exposure. This has led to a growing interest in sparse-view CBCT reconstruction to reduce radiation doses. While recent advances, including deep learning and neural rendering algorithms, have made strides in this area, these methods either produce unsatisfactory results or suffer from time inefficiency of individual optimization. In this paper, we introduce a novel geometry-aware encoder-decoder framework to solve this problem. Our framework starts by encoding multi-view 2D features from various 2D X-ray projections with a 2D CNN encoder. Leveraging the geometry of CBCT scanning, it then back-projects the multi-view 2D features into the 3D space to formulate a comprehensive volumetric feature map, followed by a 3D CNN decoder to recover 3D CBCT image. Importantly, our approach respects the geometric relationship between 3D CBCT image and its 2D X-ray projections during feature back projection stage, and enjoys the prior knowledge learned from the data population. This ensures its adaptability in dealing with extremely sparse view inputs without individual training, such as scenarios with only 5 or 10 X-ray projections. Extensive evaluations on two simulated datasets and one real-world dataset demonstrate exceptional reconstruction quality and time efficiency of our method.
Sparse-view CBCT reconstruction is a highly ill-posed problem with twofold challenges: (1) How to bridge the dimension gap between multi-view 2D X-ray projections and the CBCT image; (2) How to solve information insufficiency introduced by extremely sparse-view input. In this study, we introduce a geometry-aware encoder-decoder framework to solve this task efficiently. It seamlessly integrates the multi-view consistency of neural rendering and the generalization ability of deep learning, effectively addressing the challenges mentioned above. Specifically, we first adopt a 2D convolutional neural network (CNN) encoder to extract multi-view 2D features from different X-ray projections. Then, in aligning with the geometry of CBCT scanning, we back-project multi-view 2D features into 3D space, which properly bridges the dimension gap with multi-view consistency. Particularly, as different views offer varying degrees of information, an adaptive feature fusion strategy is further introduced to aggregate these multi-view features. Consequently, a 3D volumetric feature is constructed and then decoded into 3D CBCT image with a 3D CNN decoder. Our framework's inherent geometry awareness ensures accurate information retrieval from multi-view X-ray projections. Moreover, by capturing prior knowledge from populations in extensive datasets, our method can generalize well across different patients without individual optimization, even with extremely sparse input views, such as 5 or 10 views. We have validated our effectiveness on two simulated datasets (dental and spine) and one real-world dataset (walnut). You may refer to the code link for the details of our datasets.