Benchmarking Self-Supervised Vision Transformers in Astronomy

Yuzhu Wang; Meng JIa Lv; Peng Jia; Yongheng Wang; Manni Duan

Benchmarking Self-Supervised Vision Transformers in Astronomy

Yuzhu Wang, Meng JIa Lv, Peng Jia, Yongheng Wang, Manni Duan

03 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: AI4Science, self-supervised pre-training, benchmark datasets, transfer learning

TL;DR: This work aims to extend the success of self-supervised pre-training on natural images to astronomical data.

Abstract: This work does not describe a novel method. Instead, it aims to extend the success of self-supervised pre-training on natural images to astronomical data. To address the lack of comprehensive benchmarks in astronomy, we first curate an unlabeled pre-training dataset and multiple datasets for typical astronomical tasks. Through extensive experiments, we demonstrate that our pre-training scheme has the following advantages. Representation transferability: pre-training followed by fine-tuning can not only significantly boost performance but also reduce training epochs compared to from-scratch training on downstream tasks (e.g., improve 12\% accuracy and reduce 83\% epochs in galaxy classification), mirroring trends in natural image domains. Cross-instrument generalization: our pre-trained model generalizes across telescope instruments and outperforms domain models. Domain-specific pre-training data value: In-domain pre-training data further improves model performance and surpasses the results trained on general datasets such as ImageNet and other domain datasets. Furthermore, we explore Vision Transformers (ViTs) scaling in astronomy via parameter and data variation to offer insights and experiences for vision foundation model development in astronomy.

Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)

Supplementary Material: zip

Submission Number: 1287

Loading