TCMReasonSet: A Dataset for Explainable Medical Reasoning in Traditional Chinese Medicine

Yu Tong; Weihao Guo

TCMReasonSet: A Dataset for Explainable Medical Reasoning in Traditional Chinese Medicine

Yu Tong, Weihao Guo

18 Sept 2025 (modified: 06 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Dataset, Traditional Chinese Medicine, Large Language Model Reasoning

Abstract: Large language models (LLMs) excel in structured tasks such as mathematics and programming but remain limited in knowledge-intensive domains like Traditional Chinese Medicine (TCM), which require complex reasoning.The primary bottleneck stems from the scarcity of high-quality training corpora that are well-structured and explicitly traceable in their reasoning pathways. To address this, we introduce TCMReasonSet, a high-quality dataset specifically designed for TCM clinical reasoning, aimed at enhancing the reliability and interpretability of LLMs in solving TCM-related problems. The construction of TCMReasonSet comprises three core components: (1) a proprietary TCM knowledge graph we developed — containing 52,000 entities and 1.38 million relations — serving as the foundation for dynamic retrieval and reasoning; (2) the generation of clinical question-answer pairs using LLMs, grounded in the aforementioned knowledge graph; and (3) building upon the knowledge graph and QA pairs, we propose the “TCM Tree-of-Thought” (TCM-ToT) methodology, which incorporates a dual-dimension scoring mechanism (logical consistency + factual accuracy) to evaluate clinical QA pairs and transform them into coherent, interpretable reasoning chains with explicit pathways. Through this pipeline, we ultimately generated 36,573 clinically interpretable reasoning samples. Experimental results demonstrate that fine-tuning models with TCMReasonSet significantly enhances medical problem-solving performance: the DeepSeek-Distill-8B model achieves an 8.9\% accuracy gain, while our TCMReason-8B model surpasses the current state-of-the-art medical reasoning model by a 5.7\% margin. Furthermore, expert evaluations further validate the reliability of our dataset in terms of factual accuracy and logical coherence.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 11820

Loading