Read Before Imputing: Injecting PubMed-Informed Semantic Priors for Multi-Tissue Gene Expression Imputation

ICLR 2026 Conference Submission15546 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: AI for healthcare, imputation
TL;DR: We present ReadImpute, a novel framework that incorporates literature-derived semantic priors via retrieval-augmented generation (RAG), significantly improving multi-tissue gene expression imputation performance.
Abstract: The integration of gene expression across tissues and cell types is essential for uncovering the systemic biological mechanisms that underlie disease and homeostasis. Yet in practice, gene expression data are rarely available for all tissues, posing a major barrier to understanding cross-tissue regulation and disease etiology. Existing methods attempt to overcome this issue by imputing tissue-specific gene expression from the observed expression of other tissues. However, these methods rely solely on observed data while overlooking biological priors—fundamental knowledge sources that can critically enhance biologically meaningful predictions. To address this limitation, we propose ReadImpute, a novel framework for multi-tissue gene expression imputation that injects semantic priors derived from biomedical literature into the imputation process. ReadImpute leverages retrieval-augmented generation (RAG) with a local large language model (LLM) to distill PubMed articles into semantic embeddings of genes and tissues, which serve as external priors guiding a neural network for multi-tissue gene expression imputation. Extensive experimental results demonstrate that ReadImpute significantly improves imputation performance and generalizes well to unseen tissue profiles. ReadImpute bridges the gap between biomedical literature and data-driven learning, offering a biologically grounded solution for gene expression imputation.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 15546
Loading