QUANTIFYING KNOWLEDGE:A BAYESIAN FRAMEWORK FOR LLM-DRIVEN CLASSIFICATION

Chenxi He

QUANTIFYING KNOWLEDGE:A BAYESIAN FRAMEWORK FOR LLM-DRIVEN CLASSIFICATION

Chenxi He

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Bayesian Model Combination Large Language Models Domain Knowledge Integration Classification Information Gain

Abstract: Integrating domain-specific knowledge into machine learning models is a critical challenge, especially in complex classification tasks where data features are ambiguous. This paper introduces a general framework, LLM-Enhanced Bayesian Model Combination (LLM-BMC), which leverages large language models (LLMs) to incorporate structured domain knowledge into a Bayesian model combination process, dynamically refining classification probabilities. We present a rigorous mathematical formulation of the λ parameter, which quantifies the influence of domain knowledge on model predictions, and validate its effectiveness through information gain analysis and convergence studies. Our framework systematically improves classification performance, particularly in scenarios with overlapping features or heterogeneous populations. As a demonstration case, we apply the framework to single-cell classification, where it excels in handling overlapping markers for different classes. This work provides a formal mathematical bridge between data-driven predictions and structured domain reasoning, establishing and empirically validating a principled methodology for knowledge-intensive classification.

Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)

Submission Number: 9651

Loading