CrystalSeg: Automating Synchrotron Tomographic Reconstruction Segmentation for Crystallography with Physically Guided Simulations

17 Sept 2025 (modified: 29 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: AI4S, segmentation, synthetic dataset, tomography_reconstruction
Abstract: Automated 3D segmentation of tomographic volumes is a critical bottleneck in long-wavelength X-ray crystallography, a technique crucial for drug development and validating structural models from systems like AlphaFold3. This segmentation is a prerequisite for ray-tracing absorption correction, which is necessary for data processing in X-ray crystallography experiments. However, it is currently performed manually by experts, which is a process that is slow, costly, and prevents full automation of the scientific pipeline. The primary barrier to automation is the prohibitive expense and difficulty of collecting annotated segmentation data. To address this data scarcity problem, we present **CrystalSeg**, a novel, GPU-accelerated simulation and segmentation pipeline. It generates vast amounts of annotated data by simulating synchrotron X-ray tomography images and their corresponding reconstructed 3D volumes. We demonstrate that segmentation networks trained on CrystalSeg's synthetic data achieve dramatic performance gains over models trained on limited real data, with **improvements of 29.2\% in Recall, 30.5\% in IoU, and 24.9\% in F1 score** for finding the crystal. CrystalSeg effectively reduces the expert labor required for segmentation from hours to minutes. More importantly, it enables, for the first time, a fully automated solution for ray-tracing absorption correction in long-wavelength crystallography, making this advanced structural biology technique more scalable and accessible.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 9902
Loading