Bottom-up Domain-specific Superintelligence: A Reliable Knowledge Graph is What We Need

TMLR Paper5443 Authors

22 Jul 2025 (modified: 03 Aug 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Language models traditionally utilized for cross-domain generalization in natural language understanding and generation have recently demonstrated task-specific reasoning through inference-time scaling. However, their top-down training approach on general text corpora is insufficient for acquiring domain-specific abstractions required for deep expertise in a particular domain. This may require a bottom-up approach that acquires deep expertise by explicitly learning to compose simple concepts of a domain into more complex ones. A knowledge graph (KG) provides such an abstraction where domain primitives are captured by head-relation-tail triples. A KG path formed by such triples captures a higher-level concept. We present a task generation pipeline that directly synthesizes tasks from the domain-specific primitives, enabling the model to explicitly acquire and compose these primitives for reasoning. We fine-tune language models on the resultant bottom-up KG-grounded curriculum to demonstrate domain-specific superintelligence. Although our approach is readily applicable to a wide variety of domains, we validate it in the context of medicine where reliable KGs are available. Applying our proposed pipeline to a medical KG, we curate a dataset of 24,000 high-quality reasoning tasks paired with structured thinking traces derived from diverse medical primitives. We fine-tune the QwQ-32B model on this bottom-up curriculum to obtain QwQ-Med-3 that takes a step towards medical superintelligence. We also introduce an evaluation suite, ICD-Bench, to quantify domain-specific capabilities of models on reasoning tasks across 15 medical domains. Our experiments demonstrate that QwQ-Med-3 significantly outperforms state-of-the-art open-source and proprietary reasoning models on all categories of ICD-Bench. Further analysis reveals that QwQ-Med-3 utilizes acquired primitives to especially widen the performance gap on the hardest tasks in ICD-Bench. Finally, evaluation on external medical question-answer benchmarks shows that QwQ-Med-3 is able to transfer acquired expertise to improve the performance of the base model. The industry's approach to artificial general intelligence (AGI) centers on breadth of acquired expertise. We envision a future in which a compositional model of AGI emerges from interacting superintelligent agents, much like how the human society hierarchically acquires ever deeper expertise by combining the expertise of a group of individuals in adjacent domains or super-domains. Furthermore, since language models that are fine-tuned for superintelligence can be relatively small (e.g., 32B parameters), this bottom-up approach may also significantly cut down on training/inference energy costs.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Jonathan_Berant1
Submission Number: 5443
Loading