From One to Zero: RAG-IM Adapts Language Models for Interpretable Zero-Shot Predictions on Clinical Tabular Data

Sazan Mahbub; Caleb Ellington; Sina Alinejad; Kevin Wen; Yingtao Luo; Ben Lengerich; Eric P. Xing

From One to Zero: RAG-IM Adapts Language Models for Interpretable Zero-Shot Predictions on Clinical Tabular Data

Sazan Mahbub, Caleb Ellington, Sina Alinejad, Kevin Wen, Yingtao Luo, Ben Lengerich, Eric P. Xing

Published: 10 Oct 2024, Last Modified: 19 Nov 2024TRL @ NeurIPS 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: interpretable models, pre-trained language models, machine learning on clinical tabular data, retrieval-augmented generation

TL;DR: We introduce a new method, named Retrieval Augmented Generation of Interpretable Models (RAG-IM), that can adapt pre-trained language models for interpretable predictions on clinical tabular data in zero-shot fashion

Abstract: Clinical machine learning models, often learned from tabular data, must adapt to new settings such as different hospitals, clinicians, or patient populations. These differing environments present related but subtly distinct tasks, where diseases and medical interventions share common foundations but vary in meaningful ways. In contrast to one-size-fits-all invariant feature learning, we believe representing meaningful differences between domains and adapting to these differences will improve accuracy, utility, and interpretability of machine learning in health. Here, we introduce Retrieval-Augmented Generation of Interpretable Models (RAG-IM), a highly performant method for adapting statistical models that are trained on tabular data to new domains based on their descriptions. By leveraging the strengths of Retrieval-Augmented Generation (RAG), our framework retrieves relevant models from related tasks and combines them with contextual insights from pre-trained language models. RAG-IM generates task-specific, interpretable models that perform reliably, even in few-shot and zero-shot scenarios where data are limited or completely unavailable. Through experiments on 7487 related tasks, we find that RAG-IM is a promising general-purpose platform to enable model-based analysis to data-limited and heterogeneous regimes by connecting statistical analysis with natural language.

Submission Number: 58

Loading