Multimodal Masked Polymer Autoencoder for Unified Polymer Informatics

HyeongJoo Hwang; Heesan Kong; Seoyon Choe; Dongkyu Lee; Bumjoon J. Kim; Kee-Eung Kim

Multimodal Masked Polymer Autoencoder for Unified Polymer Informatics

HyeongJoo Hwang, Heesan Kong, Seoyon Choe, Dongkyu Lee, Bumjoon J. Kim, Kee-Eung Kim

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Polymer Informatics, Multimodal Learning, Scientific discovery, Data-driven polymer development, Multi-view representation learning

Abstract: Recent advances in large-scale sequence modeling have opened new opportunities for polymer informatics, enabling both property prediction from structures and inverse design of structures from desired properties. Most existing approaches, however, model these tasks as separate mappings, limiting their flexibility and robustness. We propose a multimodal representation learning framework that unifies diverse polymer informatics tasks within a single model. Our approach treats each property or structural element as an individual submodality and introduces an information-theoretic objective that balances informativeness across arbitrary subsets of modalities. The resulting Multimodal Masked Polymer Autoencoder (MMPAE) serves as an end-to-end foundation model, supporting both cross-modal generation and retrieval. Extensive experiments on large polymer datasets show that MMPAE not only surpasses strong task-specific baselines under realistic missing-value conditions, but also provides a flexible platform for diverse downstream applications within a unified architecture.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 24109

Loading