Evaluating Chemistry Prompts for Large-Language Model Fine-Tuning

Published: 08 Oct 2024, Last Modified: 03 Nov 2024AI4Mat-NeurIPS-2024EveryoneRevisionsBibTeXCC BY 4.0
Submission Track: LLMs for Materials Science - Short Paper
Submission Category: AI-Guided Design + Automated Material Characterization
Keywords: Large Language Models, Fine-Tuning, Prompting, Templating, Benchmarking
TL;DR: A diverse set of chemistry related templates and data representations are used to fine-tune a LLM, which is then evaluated on its memorization and generalization performance.
Abstract: We perform a study of large language model (LLM) templating and data presentation in the field of chemistry and materials science by analyzing memorization and generalization performance of a LlaMa model fine-tuned on 34 unique datasets. As application domains for LLMs become more specialized, it becomes more and more important to understand the impacts of training data, templates, and evaluations. While many pretrained LLMs have observed enormous corpora of text data, they are not guaranteed to be useful in domain specific tasks which may involve specialized data and prompts, such as chemistry and materials science. To further understand the capabilities of LLMs, we study the performance of various fine-tuned base models and show how differences in template styles with varying molecular string representations affect model performance. We hope that these insights may serve as a helpful path towards future larger scale training for chemistry and materials science specific LLMs.
Submission Number: 43
Loading