Abstract: Understanding multimodal molecular knowledge is crucial for advancing biomedicine, chemistry, and materials science. Molecule language models (MoLMs) have become powerful tools in these domains, integrating structural representations (e.g., SMILES strings, 2D graphs) with contextual descriptions (e.g., physicochemical properties, biomedical applications). However, MoLMs can encode and propagate inaccuracies due to low-quality training data or malicious manipulation. While model editing has been explored for general-domain AI, its application to MoLMs remains uncharted, presenting unique challenges due to the multifaceted and interdependent nature of molecular knowledge. In this paper, we take the first step toward MoLM editing for two critical tasks: molecule-to-caption generation and caption-to-molecule generation. To address molecule-specific challenges, we propose MolEdit, a novel framework that enables targeted modifications while preserving unrelated molecular knowledge. To systematically evaluate editing performance, we introduce MEBench, a comprehensive benchmark assessing multiple dimensions, including reliability, locality, and generality. Extensive experiments on MEBench highlight the distinct challenges of MoLM editing and demonstrate MolEdit's superiority over existing methods.
Paper Type: Long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: cross-modal application, cross-modal content generation, cross-modal machine translation, multimodality, healthcare applications
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Reproduction study, Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: English
Submission Number: 6109
Loading