Evaluating Sparse Galaxy Simulations via Out-of-Distribution Detection and Amortized Bayesian Model Comparison

Published: 09 Oct 2024, Last Modified: 15 May 2025Machine Learning and the Physical Sciences Workshop, NeurIPS 2024.EveryoneRevisionsCC BY 4.0
Abstract: Cosmological simulations are a powerful tool to advance our understanding of galaxy formation and many simulations model key properties of real galaxies. A question that naturally arises for such simulations in light of high-quality observa- tional data is: How close are the models to reality? Due to the high-dimensionality of the problem, many previous studies evaluate galaxy simulations using simplified summary statistics of physical properties. In this work, we combine simulation- based Bayesian model comparison with a novel misspecification detection tech- nique to compare simulated galaxy images of 6 hydrodynamical models against real Sloan Digital Sky Survey (SDSS) observations. Since cosmological simulations are computationally costly, we address the problem of low simulation budgets by first training a k-sparse variational autoencoder (VAE) on the abundant dataset of SDSS images. The VAE learns to extract informative latent embeddings and delineates the typical set of real images. To reveal simulation gaps, we then perform out-of-distribution (OOD) detection based on the logits of classifiers trained on the embeddings of simulated images. Finally, we perform amortized Bayesian model comparison using probabilistic classification, identifying the relatively best- performing model along with partial explanations through SHAP values.
Loading