TL;DR: We use LLMs to elicit prior distributions for linear predictive models, which reduce the number of examples required to build a predictive model. Further we find they are more reliable than using in-context learning on numerical predictive tasks.
Abstract: Large language models (LLMs) acquire a breadth of information across various domains. However, their computational complexity, cost, and lack of transparency often hinder their direct application for predictive tasks where privacy and interpretability are paramount. In fields such as healthcare, biology, and finance, specialised and interpretable linear models still hold considerable value. In such domains, labelled data may be scarce or expensive to obtain. Well-specified prior distributions over model parameters can reduce the sample complexity of learning through Bayesian inference; however, eliciting expert priors can be time-consuming. We therefore introduce AutoElicit to extract knowledge from LLMs and construct priors for predictive models. We show these priors are informative and can be refined using natural language. We perform a careful study contrasting AutoElicit with in-context learning and demonstrate how to perform model selection between the two methods. We find that AutoElicit yields priors that can substantially reduce error over uninformative priors, using fewer labels, and consistently outperform in-context learning. We show that AutoElicit saves over 6 months of labelling effort when building a new predictive model for urinary tract infections from sensor recordings of people living with dementia.
Lay Summary: In this work, we propose AutoElicit, a method for using LLMs to aid predictive modelling tasks, with a focus on healthcare. Specifically, we present a method for using LLMs to elicit expert prior distributions for linear predictive models and demonstrate how human experts can aid the process. We then compare the posterior predictions with those made through in-context learning, where language models make predictions directly. Using data from our study on dementia, we show that AutoElicit saves over 6 months of labelling effort when building a new predictive model for urinary tract infections from sensor recordings of participants.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Link To Code: https://github.com/alexcapstick/llm-elicited-priors
Primary Area: Applications->Health / Medicine
Keywords: LLMs for healthcare, Prior elicitation, LLM-elicited priors
Flagged For Ethics Review: true
Submission Number: 11175
Loading