Universally Converging Representations of Matter Across Scientific Foundation Models

Published: 20 Sept 2025, Last Modified: 05 Nov 2025AI4Mat-NeurIPS-2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: representation, MLIP, representation alignment, foundation model
TL;DR: Representations of matter in scientific foundation models are converging.
Abstract: Scientific foundation models are rapidly emerging across physics, chemistry, and biology, yet it remains unclear whether they converge toward a shared representation of matter or remain governed by domain and modality. We analyze embeddings from 49 models spanning molecules, materials, and proteins, using two complementary alignment metrics to probe their learned representations. We find modest cross-modality alignment for molecules and materials but strong alignment among protein models. We find that training dataset, rather than architecture, is the dominant factor shaping latent spaces. We see some hint of models converging into an optimal solution for the representation space, as nontrivial cross-modal alignment and strong alignment within modalities indicate. However, models align more strongly out-of-distribution than in-distribution, suggesting they remain data-limited and fall short of true foundation status. Our framework establishes representation alignment as a dynamic criterion for evaluating foundation-level generality in scientific models. This is an abbreviated, work-in-progress submission of our full manuscript, which will be linked in the comments below shortly.
Submission Track: Paper Track (Short Paper)
Submission Category: All of the above
Submission Number: 151
Loading