Element2Vec: Build Chemical Element Representation from Text for Property Prediction

ICLR 2026 Conference Submission11018 Authors

18 Sept 2025 (modified: 26 Nov 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: AI4MaterSci, test-time training, chemical elements representation from text, sparse data
TL;DR: Generate vector representation for chemical elements from Wikipedia text for reliable property discovery to speed up material design
Abstract: Accurate property data for chemical elements is crucial for materials design and manufacturing, but many of them are difficult to measure directly due to equip- ment constraint. While traditional methods use the property of other elements, or related properties for prediction via numerical analyses, they often fail to model complex relationships. After all, not all characteristics can be represented as scalars. Recent efforts has been made to explore advanced AI tools such as lan- guage model for property estimation, but still suffer from hallucinations and a lack of interpretability. In this paper, we investigate Element2Vec to effectively represent chemical elements from natural languages to support research in the natural sciences. Given the text parsed from Wikipedia pages, we use language models to generate both a single general-purpose embedding (Global) and a set of attribute-highlighted vectors (Local). Despite the complicated relationship across elements, the computational challenges also exists becuase of 1) the discrepancy in text distribution between common descriptions and specialized scientific texts, and 2) the extremely limited data, i.e., with only 118 known elements, data for specific properties is often highly sparse and incomplete. Thus, we also design test-time training method based on self-attention to mitigate the prediction error caused by Vanilla regression clearly. We hope this work could pave the way for advancing AI-driven discovery in materials science.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 11018
Loading