Chemical Language Models Have Problems with Chemistry: A Case Study on Molecule Captioning Task

Published: 19 Mar 2024, Last Modified: 15 May 2024Tiny Papers @ ICLR 2024 NotableEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Probing, chemical language models, molecule augmentation
Abstract: Drug discovery has been greatly enhanced through the recent fusion of molecular sciences and natural language processing, leading these research fields to significant advancements. Considering the crucial role of molecule representation in chemical understanding within these models, we introduce novel probing tests designed to evaluate chemical knowledge of molecular structure in state-of-the-art language models (LMs), specifically MolT5 and Text+Chem T5. These probing tests are conducted on a molecule captioning task to gather evidence and insights into the language models' comprehension of chemical information. By applying rules to transform molecular SMILES into equivalent variants, we have observed significant differences in the natural language descriptions generated by the LM for a given molecule depending on the exact transformation used.
Supplementary Material: zip
Submission Number: 229
Loading