Bayesian Inference for Correlated Human Experts and Classifiers

Markelle Kelly; Alex James Boyd; Sam Showalter; Mark Steyvers; Padhraic Smyth

Bayesian Inference for Correlated Human Experts and Classifiers

Markelle Kelly, Alex James Boyd, Sam Showalter, Mark Steyvers, Padhraic Smyth

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Applications of machine learning often involve making predictions based on both model outputs and the opinions of human experts. In this context, we investigate the problem of querying experts for class label predictions, using as few human queries as possible, and leveraging the class probability estimates of pre-trained classifiers. We develop a general Bayesian framework for this problem, modeling expert correlation via a joint latent representation, enabling simulation-based inference about the utility of additional expert queries, as well as inference of posterior distributions over unobserved expert labels. We apply our approach to two real-world medical classification problems, as well as to CIFAR-10H and ImageNet-16H, demonstrating substantial reductions relative to baselines in the cost of querying human experts while maintaining high prediction accuracy.

Lay Summary: In many high-stakes domains (such as healthcare), human experts and machine learning (ML) models work together to make decisions. However, consulting multiple human experts for every case (e.g., radiologists reviewing an X-ray) is often impractical and expensive. Our research addresses the challenge of accurately predicting what a group of human experts would conclude—without always having to ask them. To this end, we developed a statistical approach that learns how each member of a group of human experts (and ML models) usually makes predictions, capturing relationships between the different agents. By leveraging these relationships, our method helps to minimize querying of human experts, choosing whom to query and then predicting the remaining, unobserved opinions. We tested our approach on several real-world image classification tasks, showing that it can accurately predict the final expert conclusion while making fewer expert queries on average. Altogether, this work makes collaborative human-AI decision-making more efficient and affordable—especially in high-stakes settings where expert input is valuable but limited.

Primary Area: Probabilistic Methods->Everything Else

Keywords: human-ai, consensus, bayesian

Submission Number: 13891

Loading