Thought Graph: Balancing Specificity and Uncertainty in LLM-Based Gene Set Annotation

Kyle Cox, Gang Qu, Chi-Yang Hsu, Jiawei Xu, Yingtong Zhou, Zhen Tan, Mengzhou Hu, Tianlong Chen, Ziniu Hu, Zhongming Zhao, Ying Ding

Published: 2025, Last Modified: 16 Sept 2025ICHI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Accurate predictive reasoning is a cornerstone of biomedical decision-making, particularly in precision oncology, where elucidating the intricate relationships between disease-risk genes and biological processes is critical. This study presents a novel "Thought Graph" methodology, an advancement of the Tree of Thoughts framework, to systematically generate and refine biological process representations derived from gene sets while addressing the trade-off between specificity and uncertainty. Balancing these factors is essential for robust and interpretable gene set analyses, as it accounts for the complexity, variability, and overlapping functions of biological pathways. Furthermore, we introduce a quantitative metric that integrates specificity and uncertainty, thereby enhancing the rigor and transparency of the inference process. Using a subset of the Gene Ontology database, we evaluate the effectiveness of our system in generating biologically meaningful terms that accurately describe the underlying biological processes of gene sets. We compare its performance against a domain-specific tool (GSEA) and five LLM baselines across multiple metrics. Our system achieves the highest cosine similarity (64.00%) and specificity percentile (96.40%), highlighting its capacity to generate terms closely aligned with human annotations while maintaining a balance between specificity and accuracy. By advancing the artificial intelligence driven analyses, this work facilitates more informed decision-making in biomedical research, precision oncology, and related fields.

External IDs:dblp:conf/ichi/CoxQH0Z0H0HZ025