LLM Kernel: a framework for verifiable evaluation of scientific data interpretations

William Connell; Drishti Guin; Clayton Mellina

LLM Kernel: a framework for verifiable evaluation of scientific data interpretations

William Connell, Drishti Guin, Clayton Mellina

Published: 24 Sept 2025, Last Modified: 15 Oct 2025NeurIPS2025-AI4Science PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: Track 1: Original Research/Position/Education/Attention Track

Keywords: LLM, verification, kernel, similarity, transcriptomics

TL;DR: LLM Kernel is a framework that makes an LLM’s interpretation of data verifiable by prompting it to produce a quantitative similarity score directly linked to its qualitative reasoning trace.

Abstract: Large language models (LLMs) have demonstrated strong performance on structured tasks such as mathematics and scientific problem-solving, but their role in open-ended discovery science remains limited by the difficulty of validating their complex reasoning. Here we introduce LLM Kernel, a framework that makes an LLM's interpretation of data verifiable by prompting it to produce a quantitative similarity score directly linked to its qualitative reasoning trace. Applied to transcriptomics, an LLM kernel consistently outperforms standard numerical approaches in recovering known biological relationships, with performance improving as a function of compute. Ablation experiments show that performance depends on the model's biological knowledge of gene identities rather than mere approximation of statistical correlations. Furthermore, the framework's flexibility enables novel cross-modal comparisons: an LLM kernel can score the similarity between a natural language description of a disease and a numerical gene expression profile to identify relevant therapeutic compounds. LLM Kernel provides a scalable approach to quantitatively benchmark model reasoning, representing a step towards auditable AI for scientific interpretation.

Submission Number: 281

Loading