MolRAG: Unlocking the Power of Large Language Models for Molecular Property Prediction

ACL ARR 2025 February Submission1655 Authors

14 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Recent LLMs exhibit limited effectiveness on molecular property prediction task due to the semantic gap between molecular representations and natural language, as well as the lack of domain-specific knowledge. To address these challenges, we propose MolRAG, a Retrieval-Augmented Generation framework integrating Chain-of-Thought reasoning for molecular property prediction. MolRAG operates by retrieving structurally analogous molecules as contextual references to guide stepwise knowledge reasoning through chemical structure-property relationships. This dual mechanism synergizes molecular similarity analysis with structured inference, while generating human-interpretable rationales grounded in domain knowledge. Experimental results show MolRAG outperforms pre-trained LLMs on four datasets, and even matches supervised methods, achieving performance gains of 1.1\%–45.7\% over direct prediction approaches, demonstrating versatile effectiveness.
Paper Type: Long
Research Area: Special Theme (conference specific)
Research Area Keywords: Large Language Models, Retrieval Augmented Generation, LLMs, RAG
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Submission Number: 1655
Loading