MIR: A Benchmark for Molecular Image Retrival with a Cross-modal Pretraining Framework

Baole Wei, Ruiqi Jia, Shihan Fu, Xiaoqing Lyu, Liangcai Gao, Zhi Tang

Published: 01 Jan 2022, Last Modified: 06 May 2023BIBM 2022Readers: Everyone

Abstract: Molecular image retrieval is one of the crucial steps in automatic mining and utilization of biochemistry-related literatures, which is also a relatively open and challenging task in cross fields of biochemistry and artificial intelligence. The challenges come from two aspects: 1) there is a lack of open datasets and evaluation criteria for molecular image retrieval. 2) Common retrieval methods always ignore that molecular image retrieval has cross-modal information of both images and SMILES texts. To address the first challenge, we firstly construct a new molecular image retrieval benchmark, named MIR, including 130770 molecular images, labeled structural similarity, and reasonable evaluation metrics. Faced with the second challenge, we propose an effective cross-modal pre-training framework for molecular image retrieval following CLIP. Experimental results reflect the effectiveness of our proposed benchmark MIR and cross-modal pre-training framework.

0 Replies