Extracting Probabilistic Knowledge from Large Language Models for Bayesian Network Parameterization

Published: 21 May 2026, Last Modified: 31 May 2026Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: In this work, we evaluate the potential of Large Language Models (LLMs) in building Bayesian Networks (BNs) by approximating domain expert priors. LLMs have demonstrated potential as factual knowledge bases; however, their capability to generate probabilistic knowledge about real-world events remains understudied. We explore utilizing the probabilistic knowledge inherent in LLMs to derive probability estimates for statements regarding events and their relationships within a BN. Using LLMs in this context allows for the parameterization of BNs, enabling probabilistic modeling within specific domains. Our experiments on eighty publicly available Bayesian Networks, from healthcare to finance, demonstrate that querying LLMs about the conditional probabilities of events provides meaningful results when compared to baselines, including random and uniform distributions, as well as approaches based on next-token generation probabilities. We explore how these LLM-derived distributions can serve as expert priors to refine distributions extracted from data, especially when data is scarce. Overall, this work introduces a promising strategy for automatically constructing Bayesian Networks by combining probabilistic knowledge extracted from LLMs with real-world data. Additionally, we establish the first comprehensive baseline for assessing LLM performance in extracting probabilistic knowledge.
Submission Type: Regular submission (no more than 12 pages of main content)
Supplementary Material: zip
Changes Since Last Submission: N/A for previous TMLR resubmission. This is the camera-ready revision of an accepted submission. 1. We expanded the Limitations and Ethical Considerations section. Beyond the existing discussion of domain-dependent LLM performance and the inadequacy of merely beating a uniform baseline for safety-critical use, we added two new limitations: (i) we cannot rule out that some bnRep networks appear in the LLMs' pre-training corpora, but note that our contribution is methodological and still valuable, independent of contamination and (ii) LLMs encode social/demographic biases that can propagate through the elicited priors into BN-based decisions, so practitioners should audit elicited distributions before deploying in sensitive domains. 2. Added a central Cost-Benefit Analysis paragraph in the Discussion. The previous "Trade-off Between SepState and FullDist Schemes" paragraph has been replaced by a "Cost Considerations for LLM-Based BN Parameterization" subsection that consolidates points that were previously scattered across the reviewer discussion: (a) cost-reduction strategies and (b) comparison against realistic alternatives for BN parameterization, namely large-scale domain-specific data collection (often infeasible, especially for rare diseases or novel domains) and human expert elicitation (costly, scarce, requires vetting and opinion aggregation). These are in addition to the changes suggested by reviewers that were made during the rebuttal phase, and include minor camera-ready updates such as author names/affiliations, the OpenReview forum link, the publication month/year, etc.
Code: https://github.com/HLR/llm-bn-parameterization
Assigned Action Editor: ~Sebastian_Tschiatschek1
Submission Number: 7055
Loading