MolRA: Molecule-Graph Guided Parameter Space Alignment for Molecular Multimodal Large Language Models
Keywords: Molecular Multimodal Large Language Models, Parameter Space Alignment, Structure-Aware Molecular Reasoning
Abstract: Most existing molecular multimodal large language models rely on input space alignment, where molecular graphs are represented as sequences of continuous embeddings and combined with text prompts into a unified input sequence. This paradigm flattens complex, branched molecular topologies into graph tokens merged with text prompts, requiring the LLM to reconstruct spatial relationships within the shared context window through intensive self-attention, potentially leading to structural fragmentation and increased computational overhead. In this paper, we propose MolRA, a novel molecule-graph guided parameter space alignment approach. Instead of prepending graph tokens, our approach employs a graph-guided weight generator to transform molecular structural features into molecule-graph guided weight signals, which are then injected directly into the decode layers of a frozen LLM. This approach shift decouples molecular graph integration from the input stream and enables molecular structural reasoning without input sequence expansion. Experimental results demonstrate that the proposed approach outperforms recent input-alignment-based molecular multimodal LLMs in both chemical accuracy and instruction-following efficiency. It achieves unified instruction tuning across 11 tasks, attaining state-of-the-art performance on 8 of them, and offers a novel perspective on multimodal integration within scientific domains.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: multimodal applications
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 6065
Loading