Keywords: Machine Learning, Variational Inference, Autoregressive Models, Molecular Generation
Abstract: Text-based autoregressive models (ARMs) are popular for SMILES (Simplified Molecular Input Line Entry System) string generation due to their simplicity and state-of-the-art performance, but typically use a fixed left-to-right order. Since optimal SMILES ordering is less obvious than for natural text, we developed LO-ARM (Learning-Order ARM) to learn a data-dependent generation order. Evaluated on ChEMBL, LO-ARM learns consistent and meaningful orderings that reveal molecular substructures, and matches or surpasses state-of-the-art models, offering a well-balanced yet competitive model option.
Submission Number: 140
Loading