Patient-Specific Biomolecular Instruction Tuning of Graph-LLMs

18 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, Foundation Models, Graph-LLM, Instruction Tuning, Multi-modal LLMs, Bioinformatics, Proteomics
Abstract: Proteomics data is imperative to pathogenic understanding of a disease phenotype. In cancer, analysis of molecular signatures enables precision medicine through the identification of biological processes that drive individualized tumor progression, therapeutic resistance, and clinical heterogeneity. Recent advances in multimodal large language models (LLMs) have shown remarkable capacity to integrate and reason across heterogeneous data modalities. However, performing multi-modal language modeling for molecular understanding of patient-specific proteomics remains a significant challenge due to 2 barriers: (1) the lack of instruction-tuning datasets that enable clinical interpretation from proteomics data, and (2) the absence of language-modeling architectures designed to capture the rich heterogeneity of molecular data. In this work, we introduce cptac-prot-instruct, the first patient-centric instruction tuning dataset for molecular understanding of oncology, comprising over 370k open-ended examples derived from individualized proteomic profiles curated from the largest national proteomics cancer study (CPTAC). Additionally, we propose KRONOS (Knowledge Representation of individualized Omics Networks via Structured tuning), a novel graph-llm framework that leverages molecular interaction topology with proteomics to learn patient-specific graph representations for enhanced clinical reasoning. In this work, w show that KRONOS achieves consistent improvements across benchmark clinical tasks, with AUC performance of up to $0.857\pm0.025$ in prognostic tasks such as mortality prediction, cancer type OS prediction, and tumor stage classification from proteomics data. Ultimately, this approach empowers LLMs to understand patient-level pathogenesis, advancing precision medicine through more accurate diagnosis, prognosis, and treatment stratification.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 11129
Loading