everyone
since 04 Oct 2024">EveryoneRevisionsBibTeXCC BY 4.0
In recent years, machine learning (ML) has significantly impacted the field of chemistry, facilitating advancements in diverse applications such as the prediction of molecular properties and the generation of molecular structures. Traditional string representations, such as the Simplified Molecular Input Line Entry System (SMILES), although widely adopted, exhibit limitations in conveying essential physical and chemical properties of compounds. Conversely, vector representations, particularly chemical fingerprints, have demonstrated notable efficacy in various ML tasks. Additionally, graph-based models, which leverage the inherent structural properties of chemical compounds, have shown promise in improving predictive accuracy. This study investigates the potential of language models based on fingerprints within a bimodal architecture that combines both graph-based and language model components. We propose a method that integrates the aforementioned approaches, significantly enhancing predictive performance compared to conventional methodologies while simultaneously capturing more accurate chemical information.