Benchmarking foundation models for unsupervised discovery in large multimodal astrophysical datasets

Maxime Ronceray; Marc Huertas-Company; Alexandre Chanson; Malgorzata Siudek; Anna Preto; Michael J. Smith; J. Zoubian; Clara Bonini

Benchmarking foundation models for unsupervised discovery in large multimodal astrophysical datasets

Maxime Ronceray, Marc Huertas-Company, Alexandre Chanson, Malgorzata Siudek, Anna Preto, Michael J. Smith, J. Zoubian, Clara Bonini

Published: 03 Mar 2026, Last Modified: 26 Apr 2026ICLR 2026 Workshop FM4Science PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: anomaly detection, astronomical surveys, multimodal self-supervised models, representation learning

Abstract: We benchmark pretrained multimodal foundation model embeddings for discovery in large scientific datasets through unsupervised anomaly detection. Using a matched imaging–spectroscopy sample from the Euclid and DESI astronomical surveys, we compare three self-supervised representation approaches: autoregressive modeling (AstroPT), contrastive image–spectrum alignment (AstroCLIP), and encoder-decoder multimodal transformers (AION), with only lightweight adaptation. We introduce a scalable pipeline based on density estimation in embedding space and multimodal anomaly isolation, combining per-modality rarity with explicit cross-modal misalignment. A cross-model ranking-transfer analysis shows that the most extreme outliers are often shared across models, while their relative prioritization strongly depends on training objectives and inductive biases, indicating that anomaly definitions are representation-relative. Qualitative inspection suggests that unimodal density tails are frequently dominated by instrumental artefacts, whereas multimodal fusion increases the prevalence of physically coherent candidates such as active galactic nuclei and gravitational lens candidates. Finally, lightweight probing tasks reveal how different embeddings trade predictive accuracy against linear decodability and effective predictive dimensionality. Together, these results provide practical guidance for deploying foundation model embeddings for discovery in upcoming large-scale survey data.

Submission Number: 57

Loading