Keywords: synthetic polymer, random heteropolymer, k-mer representation, functional materials
Abstract: Designing synthetic macromolecules with targeted functions is a long-standing challenge in materials science. Random heteropolymers (RHPs) and their blends provide a vast combinatorial design space whose physicochemical behavior emerges from short-range monomer interactions rather than global sequence order. However, the absence of explicit sequence information and the difficulty of simulating disordered polymer ensembles make structure–property prediction challenging.
Here, we introduce a $k$-mer representation learning framework for modeling and optimizing functional random copolymers, trained on data acquired from an autonomous robotic blending platform. The platform executes high-throughput synthesis, testing, and iterative optimization of RHP blends for protein stabilization, generating >$10^3$ labelled experiments in closed-loop optimization campaigns. Each polymer or blend is encoded as a concatenated $k$-mer fingerprint that captures segment-level statistics of monomer connectivity derived from stochastic polymerization models.
We demonstrate that the resulting $k$-mer features outperform one-hot composition encodings in predictive accuracy, revealing non-additive, physically interpretable correlations such as charge-pattern complementarity. The \textit{k}-mer-based model generalizes across different experimental setups. This work shows how physics-grounded statistical representations of polymer structure can bridge experimental data and machine learning, providing a scalable framework for physically faithful modeling of disordered soft-matter systems.
Submission Number: 36
Loading