PolyBind: Effectively Combining Datasets Indexed in Different Representations of Polymers

Published: 20 Sept 2025, Last Modified: 05 Nov 2025AI4Mat-NeurIPS-2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Polymer Informatics, Contrastive learning, Multi-modal representations, Material discovery, Self-supervised learning
TL;DR: We present PolyBind that leverages contrastive learning to align different polymer representations within a shared latent space.
Abstract: In polymer informatics, diverse datasets for the same material properties are available but often use different representations, posing challenges in meaningfully combining or utilizing them for machine learning (ML) models. This heterogeneity limits the predictive power of ML for material discovery. Here, we introduce PolyBind, a framework that leverages contrastive learning to align different polymer representations---including PSMILES, polymer names, and BigSMILES---within a shared latent space. PolyBind treats PSMILES as the anchor representation and maps polymer names and BigSMILES into the same embedding space, yielding a unified representation with richer chemical information than traditional fingerprint vectors. We demonstrate PolyBind's effectiveness on glass transition temperature prediction by successfully combining datasets with different polymer notations. Our framework offers a robust solution for integrating diverse polymer data sources.
Submission Track: Paper Track (Full Paper)
Submission Category: AI-Guided Design
Institution Location: {Jena, Germany}
AI4Mat Journal Track: Yes
AI4Mat RLSF: Yes
Submission Number: 57
Loading