Text-Twin-Translation (T$^{3}$): A Full-Stack Machine Learning Framework for Functional Material-Device Systems Discovery
Keywords: Large Language Models, Graph Neural Networks, Materials Informatics, Field-Effect Transistors, Automatic Prompt Engineering, Virtual Screening
TL;DR: We propose T3 (Text-Twin-Translation ), an end-to-end pipeline that mines scientific publicaitons (automatic prompt optimization), trains a topology-aware GNN digital twin to screen 123M molecules for PFAS sensor probes with DFT validation.
Abstract: The rational design of functional material-device systems remains bottlenecked by the combinatorial complexity of materials, interfaces, and processing conditions, a challenge further amplified by the scarcity of structured, device-level datasets. We introduce Text-Twin-Translation (T$^{3}$), a full-stack framework that integrates Large Language Model (LLM)-assisted literature mining with physics-embedded machine learning for sensor design. Using field-effect transistor (FET) sensors as a testbed, we make three contributions. First, we deploy textual gradient-based automatic prompt engineering to steer open-source LLMs, achieving up to 21.8% BLEU and 17.3% ROUGE-1 improvements over human-designed instructions to extract 28 structured fields from over 1,600 publications. Second, we propose a Device Topology-Embedded Graph Neural Network (DTE-GNN) that encodes device components within a physics-aware heterogeneous graph, attaining 87.7%, 85.1%, and 92.3% accuracy on lower detection limit, upper detection limit, and sensitivity prediction respectively, outperforming 17 tabular and neural baselines. Third, we present virtual screening over 123 million PubChem molecules, where in silico validation shows that a model-identified probe candidate exhibits stronger PFOS$^{-}$ (perfluorooctanesulfonate) selectivity over interferents TCA$^{-}$ (trichloroacetate) and DDS$^{-}$ (dodecylsulfonate), with binding energy differences $\Delta\Delta E = -0.23/-0.31,\mathrm{eV}$ compared to $+0.68/+0.54,\mathrm{eV}$ for the experimentally validated $\beta$-Cyclodextrin baseline. T$^{3}$ achieves state-of-the-art performance in translating unstructured scientific literature into actionable device-level insights and closed-loop discovery.
Submission Track: Full Paper
Submission Category: AI-Guided Design + Automated Synthesis
Supplementary Material: pdf
Submission Number: 39
Loading