Dual Learning Between Molecules and Natural Language

Haodi Zhang, Yong Ding, Liang-Jie Zhang, Weicheng Wang, Yuanfeng Song, Di Jiang

Published: 01 Jan 2025, Last Modified: 05 Nov 2025PAKDD (2) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Molecule captioning (mol-to-text) and text-based molecule generation (text-to-mol) represent two complementary challenges in computational chemistry. The former involves explaining the potential applications of given molecules and using them for scientific education purposes, while the latter focuses on generating molecular structures suited for specific properties and applications. Existing studies have neither acknowledged nor exploited this nature of duality between cross text-molecule tasks, leaving this duality largely unexplored. We hypothesize that jointly optimizing both tasks may lead to an improvement of overall performance by utilizing this inherent synergy. In this paper, we propose a dual training framework designed to capture this correlation between text-to-mol and mol-to-text. Regularization terms are incorporated to guide both models in learning the consistent probability distributions across both tasks during training. We evaluated the performance of our approach using the ChEBI-20 benchmark. The experimental results show notable improvements in accuracy for both text-to-molecule and molecule-to-text tasks, achieved through the application of our dual training strategy.

External IDs:dblp:conf/pakdd/ZhangDZWSJ25