Exploring Chemical Space with LLM Reasoning

Published: 28 Apr 2026, Last Modified: 28 Apr 2026MSLD 2026 PosterEveryoneRevisionsCC BY 4.0
Keywords: LLM Reasoning, Inverse Molecular Design, Molecular Generation
Abstract: Reasoning models hold immense potential for accelerating molecular discovery, yet progress is limited by the absence of dedicated datasets. Existing chemical datasets typically provide molecular structure descriptions, which are less informative than the step-by-step reasoning needed to guide models through complex and diverse molecular design tasks. We propose a novel dataset that captures complete step-wise reasoning trajectories from basic substructures to complete molecules given properties, each annotated with textual explanations. By providing compositional reasoning steps through training, the dataset allows models to explore chemical space through recombined reasoning paths and generalize to multi-objective design. This advances chemical reasoning-centric model development and facilitates structural understanding in next-generation foundation models.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 13
Loading