Explainable Vision-Language Model for Personalized Medicine

Published: 27 Jan 2025, Last Modified: 13 Mar 2025TIME 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Fast Fourier Transform, Bilateral Laplace Transform, Vision-Language Models
TL;DR: Vision-Language Model for Personalized Medicine
Abstract: Recent advancements in computer vision (CV) and natural language processing (NLP) have led to the emergence of Vision-Language Models (VLMs), which excel in interpreting complex multimodal information by seamlessly integrating visual and textual data. This paper proposes a novel, interpretable framework that combines VLMs with specific mathematical transforms—namely, the Fast Fourier Transform (FFT) for efficient computation of frequency domains, and the Bilateral Laplace Transform for enhanced stability analysis in nonlinear systems—to enhance drug discovery and personalized medicine. The interpretable application of FFT identifies periodic patterns in temporal gene expression data from genes such as TP53 and EGFR, crucial for understanding circadian influences on drug metabolism. The Bilateral Laplace Transform, also applied in an interpretable manner, assesses system stability and response under various therapeutic interventions, focusing on genes like BRCA1 and PTEN for short-term treatment outcomes. This integrated model leverages the strengths of VLMs to synthesize and contextualize the transformed data, providing a robust and interpretable analytical tool for predicting individual drug responses and optimizing treatment strategies. Validation of the proposed framework on multimodal datasets comprising clinical imaging, genomic data, and textual descriptions confirms its potential in significantly improving the precision of personalized treatment plans. The outcomes of this research advances our understanding of complex drug interactions within the human body and also pave the way for developing a user-friendly and interpretable tool that assists clinicians in real-time decision-making, ultimately enhancing patient outcomes in clinical settings. We have made all resources available on GitHub to support and encourage future studies and advancements based on our findings. You can access them at \url{https://github.com/Sarwar-UTS/Interpretable-VLMs-for-Medicine}.
Submission Number: 9
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview