From Narratives to Numbers: Valid Inference Using Language Model Predictions from Verbal Autopsies

Shuxian Fan; Adam Visokay; Kentaro Hoffman; Stephen Salerno; Li Liu; Jeffrey T. Leek; Tyler McCormick

From Narratives to Numbers: Valid Inference Using Language Model Predictions from Verbal Autopsies

Shuxian Fan, Adam Visokay, Kentaro Hoffman, Stephen Salerno, Li Liu, Jeffrey T. Leek, Tyler McCormick

Published: 10 Jul 2024, Last Modified: 26 Aug 2024COLMEveryoneRevisionsBibTeXCC BY 4.0

Research Area: Data, Science of LMs, Inference algorithms for LMs, LMs for everyone

Keywords: multiclass, inference, transportability, public health, verbal autopsy

TL;DR: We show how to perform valid statistical inference when using NLP predictions instead of directly observed data.

Abstract: In settings where most deaths occur outside the healthcare system, verbal autopsies (VAs) are a common tool to monitor trends in causes of death (COD). VAs are interviews with a surviving caregiver or relative that are used to predict the decedent’s COD. Turning VAs into actionable insights for researchers and policymakers requires two steps (i) predicting likely COD using the VA interview and (ii) performing inference with predicted CODs (e.g. modeling the breakdown of causes by demographic factors using a sample of deaths). In this paper, we develop a method for valid inference using outcomes (in our case COD) predicted from free-form text using state-of-the-art NLP techniques. This method, which we call multiPPI++, extends recent work in “prediction-powered inference” to multinomial classification. We leverage a suite of NLP techniques for COD prediction and, through empirical analysis of VA data, we demonstrate the effectiveness of our approach in handling transportability issues. multiPPI++ recovers ground truth estimates, regardless of which NLP model produced predictions and regardless of whether they were produced by a more accurate predictor like GPT-4-32k or a less accurate predictor like KNN. Our findings demonstrate the practical importance of inference correction for public health decision-making and suggests that if inference tasks are the end goal, having a small amount of contextually relevant, high quality labeled data is essential regardless of the NLP algorithm.

Supplementary Material: zip

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html

Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html

Submission Number: 496

Loading