Empirical Observations on Parameter Scaling in Chemical Language Models

Artur Safrastyan; Hrant Khachatrian

Empirical Observations on Parameter Scaling in Chemical Language Models

Artur Safrastyan, Hrant Khachatrian

Published: 28 May 2026, Last Modified: 03 Jun 2026ICML 2026 FM4LS Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Chemical Language Models, Parameter Scaling, Molecular Property Prediction, Cheminformatics, SMILES Representation

TL;DR: This empirical evaluation of autoregressive chemical language models demonstrates that while increasing the number of parameters of a model can improve downstream performance, the resulting scaling benefits remain fundamentally task-dependent.

Abstract: Recent studies have shown that increasing parameter count improves the performance in graph-based architectures for molecular property prediction. In this work, we try to examine whether similar scaling behavior can be observed in autoregressive chemical language models trained on SMILES-based datasets. Experiments were conducted on three LLaMA-based models (170M, 380M, and 1.3B parameters) on diverse downstream tasks, including multi-task ADME regression from Polaris, single-task PXR induction prediction, and large-scale binding classification on a DNA-encoded library dataset (BELKA). The results show that larger models can achieve better performance in some settings, particularly in regression tasks, though this behavior is not consistent across all the benchmarks. The PXR induction task showed consistent performance gain with increased model size. On the other hand, the models showed task-dependent variability on the Polaris benchmark. In the BELKA dataset, increased model size improved performance on the public test set but failed to generalize consistently with the private and more out-of-distribution set. The findings in this study suggest that while scaling benefits can extend to chemical language models, they are highly dependent on both task characteristics and the fine-tuning strategies used. We hope this work motivates further research using larger and more diverse model variants to better understand scaling trends in chemical language models.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 83

Loading