Keywords: AI4MaterSci, test-time training, chemical elements representation from text, sparse data
TL;DR: Generate vector representation for chemical elements from Wikipedia text for reliable property discovery to speed up material design
Abstract: Accurate property data for chemical elements is crucial for materials design and
manufacturing, but many of them are difficult to measure directly due to equip-
ment constraint. While traditional methods use the property of other elements, or
related properties for prediction via numerical analyses, they often fail to model
complex relationships. After all, not all characteristics can be represented as
scalars. Recent efforts has been made to explore advanced AI tools such as lan-
guage model for property estimation, but still suffer from hallucinations and a
lack of interpretability. In this paper, we investigate Element2Vec to effectively
represent chemical elements from natural languages to support research in the
natural sciences. Given the text parsed from Wikipedia pages, we use language
models to generate both a single general-purpose embedding (Global) and a set of
attribute-highlighted vectors (Local). Despite the complicated relationship across
elements, the computational challenges also exists becuase of 1) the discrepancy
in text distribution between common descriptions and specialized scientific texts,
and 2) the extremely limited data, i.e., with only 118 known elements, data for
specific properties is often highly sparse and incomplete. Thus, we also design
test-time training method based on self-attention to mitigate the prediction error
caused by Vanilla regression clearly. We hope this work could pave the way for
advancing AI-driven discovery in materials science.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 11018
Loading