Entropy-Guided Dynamic Tokens for Graph-LLM Alignment in Molecular Understanding

Entropy-Guided Dynamic Tokens for Graph-LLM Alignment in Molecular Understanding

ICLR 2026 Conference Submission15761 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multimodal Modeling, Graph–LLM Alignment, Molecule Understanding, Backbone-Free Tuning

TL;DR: EDT-Former: entropy-guided dynamic query tokens map molecular graphs to LLMs, capturing local and global structure features for comprehensive understanding and reasoning with backbone-free, connector-only training.

Abstract: Molecular understanding is central to advancing areas such as science and drug discovery, yet large language models (LLMs) struggle to understand molecular graphs effectively. Existing graph–LLM bridges often adapt a Q-Former–style connector with fixed-length static tokens originally designed for vision tasks. These designs overlook stereochemistry and substructural context and typically require costly LLM-backbone fine-tuning, limiting efficiency and generalization. We introduce EDT-Former, an Entropy-guided Dynamic Token Transformer that generates tokens aligned with informative molecular patches, preserving both local and global structural features for molecular graph understanding. Beyond prior approaches, EDT-Former enables alignment between frozen graph encoders and LLMs without tuning the LLM backbone, resulting in computationally efficient fine-tuning, and it achieves state-of-the-art results on the MoleculeQA and Mol-Instructions benchmarks, underscoring its effectiveness for scalable and generalizable multimodal molecular understanding.

Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)

Submission Number: 15761

Loading