Track: tiny / short paper (up to 4 pages)
Keywords: Agentic Reasoning, Biomarker Discovery, Selective State Space Models (SSMs), Scientific Hypothesis Verification, Neuro-Symbolic AI
Abstract: Gradient saliency from deep sequence models surfaces candidate biomarkers efficiently, but the resulting gene lists can be contaminated by tissue-composition
confounders that degrade downstream classifiers. We study whether LLM chainof-thought (CoT) reasoning can filter these confounders, and whether reasoning
quality is associated with downstream performance. We train a Mamba SSM
on TCGA-BRCA RNA-seq and extract the top-50 genes by gradient saliency;
DeepSeek-R1 evaluates every candidate with structured CoT to produce a final
17-gene set. On the held-out test split, the raw 50-gene saliency set (no LLM)
performs worse than a 5,000-gene variance baseline (AUC 0.832 vs. 0.903), while
the LLM-filtered set surpasses it (AUC 0.927), using 294× fewer features. A
faithfulness audit (COSMIC CGC, OncoKB, PAM50) shows that 6 of 17 selected
genes (35.3%) are validated BRCA biomarkers, while 10 of 16 known BRCA genes
present in the input were missed—including FOXA1. This divergence between
downstream performance and reasoning faithfulness suggests selective faithfulness
in this setting: targeted confounder removal can improve predictive performance
without comprehensive recall.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Funding: Yes, the presenting author of this submission falls under ICLR’s funding aims, and funding would significantly impact their ability to attend the workshop in person.
Submission Number: 123
Loading