Galaxy Zoo Evo: 1 million human-annotated images of galaxies

07 May 2025 (modified: 30 Oct 2025)Submitted to NeurIPS 2025 Datasets and Benchmarks TrackEveryoneRevisionsBibTeXCC BY-NC 4.0
Keywords: galaxies, supervised, images, dataset
TL;DR: Large labelled dataset of galaxy images, designed for training and evaluating foundation models
Abstract: We introduce Galaxy Zoo Evo, a labeled dataset for building and evaluating foundation models on images of galaxies. GZ Evo includes 104M crowdsourced labels for 823k images from four telescopes. Each image is labeled with a series of fine-grained questions and answers (e.g. `featured galaxy, two spiral arms, tightly wound, merging with another galaxy'). These detailed labels are useful for pretraining or finetuning. We also include four smaller sets of labels (167k galaxies in total) for downstream tasks of specific interest to astronomers, including finding strong lenses and describing galaxies from the new space telescope \textit{Euclid}. We hope GZ Evo will serve as a real-world benchmark for computer vision topics such as domain adaption (from terrestrial to astronomical, or between telescopes) or learning under uncertainty from crowdsourced labels. We also hope it will support a new generation of foundation models for astronomy; such models will be critical to future astronomers seeking to better understand our universe.
Croissant File: json
Dataset URL: https://huggingface.co/datasets/mwalmsley/gz_evo
Code URL: https://github.com/mwalmsley/gz-evo
Supplementary Material: pdf
Primary Area: Datasets & Benchmarks for applications in computer vision
Submission Number: 724
Loading