Element2Vec: Build Chemical Element Representation from Text for Property Prediction

Yuanhao Li; Keyuan Lai; Tianqi Wang; Qihao Liu; Jiawei Ma; Yuan-Chao Hu

Element2Vec: Build Chemical Element Representation from Text for Property Prediction

Yuanhao Li, Keyuan Lai, Tianqi Wang, Qihao Liu, Jiawei Ma, Yuan-Chao Hu

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: AI4MaterSci, test-time training, chemical elements representation from text, sparse data

TL;DR: Generate vector representation for chemical elements from Wikipedia text for reliable property discovery to speed up material design

Abstract: Accurate property data for chemical elements is crucial for materials design and manufacturing, but many of them are difficult to measure directly due to equip- ment constraint. While traditional methods use the property of other elements, or related properties for prediction via numerical analyses, they often fail to model complex relationships. After all, not all characteristics can be represented as scalars. Recent efforts has been made to explore advanced AI tools such as lan- guage model for property estimation, but still suffer from hallucinations and a lack of interpretability. In this paper, we investigate Element2Vec to effectively represent chemical elements from natural languages to support research in the natural sciences. Given the text parsed from Wikipedia pages, we use language models to generate both a single general-purpose embedding (Global) and a set of attribute-highlighted vectors (Local). Despite the complicated relationship across elements, the computational challenges also exists becuase of 1) the discrepancy in text distribution between common descriptions and specialized scientific texts, and 2) the extremely limited data, i.e., with only 118 known elements, data for specific properties is often highly sparse and incomplete. Thus, we also design test-time training method based on self-attention to mitigate the prediction error caused by Vanilla regression clearly. We hope this work could pave the way for advancing AI-driven discovery in materials science.

Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)

Submission Number: 11018

Loading