Bridging chemical modalities by aligning embeddings

Adrian Mirza; Sebastian Starke; Erinc Merdivan; Kevin Maik Jablonka

Bridging chemical modalities by aligning embeddings

Adrian Mirza, Sebastian Starke, Erinc Merdivan, Kevin Maik Jablonka

Published: 08 Jul 2024, Last Modified: 23 Jul 2024AI4Mat-Vienna-2024 OralEveryoneRevisionsBibTeXCC BY 4.0

Submission Track: Short Paper

Submission Category: AI-Guided Design

Keywords: multimodal model, retrieval, property retrieval, small molecules

TL;DR: We introduce a machine learning model that leverages five modalities, and show how the embeddings of such model can be used to retrieve molecules with similar properties.

Abstract: Chemistry as a science is highly diverse in its ways of representing molecules, and many of these representations are highly abundant in the literature and, as such, underutilized. However, there is a lack of frameworks that combine these different representations into a single one. Thus, we introduce the multimodal machine learning model MoleculeBind. It was trained with the goal of aligning five different modalities: SMILES, SELFIES, graphs, fingerprints, and 3D structures using contrastive learning. We investigate the retrieval metrics for the model and obtain high performance across all the different modalities. We also explore the potential of querying molecules with similar properties using the same approach. The retrieval of molecules with similar properties outperformed a random baseline significantly. We expect such a model to have a great impact on spectroscopy and improve the performance of existing generative methods.

Submission Number: 10

Loading