Keywords: recall, language models, molecular language models, sampling methods for language models
TL;DR: We measure the recall of language models trained on molecular datasets
Abstract: Most of the current benchmarks evaluate Generative Language Models based on the accuracy of the generated output. However, in some scenarios, it is also important to evaluate the recall of the generations, i.e., whether a model can generate all correct outputs, such as all security vulnerabilities of a given codebase. There are two challenges in evaluating the recall: the lack of complete sets of correct outputs for any task and the existence of many distinct but similar outputs (e.g., two exploits that target the same vulnerability).
In this paper, we propose a benchmark from the domain of small organic molecules. We define several sets of molecules of varying complexity and fine-tune language models on subsets of those sets. We attempt to generate as many molecules from the target sets as possible and measure the recall, i.e., the percentage of generated molecules from the target set. We examine the impact of the training loss function and sampling strategy on the recall. We propose a sampling strategy based on beam search that avoids duplicates and maximizes recall. Finally, we show that given a small validation set, one can predict the recall of the model without actually generating many samples, which can act as a model selection strategy for maximizing generation recall.
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 14268
Loading