Keywords: Extrapolation, Molecule generation, Desired property, Diffusion-based models, Large language models
Abstract: Generative models, such as diffusion-based models and large language models, have become increasingly popular in cheminformatics research. These models have shown promise in accelerating the discovery of molecules. However, they are hindered by data scarcity and struggle to accurately generate molecules when the desired properties lie outside the range of the training data, a task known as tail extrapolation in statistics. To this end, we propose tail-extrapolative generative model in this work. The key idea is to adapt pre-additive noise models, which can provably perform tail extrapolation in classical regression tasks, to a variety of conditional generative models. Across empirical studies, we find that tail-extrapolative generative models exhibit improved extrapolation capabilities. They enable the generation of molecules with properties that more closely align with desired targets. Furthermore, these models enhance the diversity of the generated molecules compared to existing approaches, representing an advancement in molecular design.
Submission Number: 195
Loading