Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.
Track: long paper (4–8 pages excluding references)
Keywords: representation learning, image retrieval, contrastive loss, cryo-EM
Abstract: Cryo-electron microscopy (cryo-EM) enables the visualization of proteins in distinct structural states, but extracting robust information from experimental images remains challenging. In particular, learning useful representations of 2D class averages is limited by the scarcity of annotated real datasets, which constrains both model training and benchmarking. To address this gap, we introduce the image retrieval task for cryo-EM 2D class averages and propose a two-stage domain-mixed training paradigm. In the first stage, the model is pretrained on easily accessible synthetic 2D class averages to establish feature representations. In the second stage, it is finetuned on a small mixed synthetic-real dataset to adapt to experimental variability. We demonstrate that this approach enables effective retrieval under limited and imbalanced data conditions, significantly outperforming models trained only on real images. Our work establishes a scalable framework for bridging synthetic and experimental data in cryo-EM, with the potential to accelerate downstream structural analysis, such as protein identification.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 65
Loading