Latent Feature Mining with Large Language Models

Bingxuan Li; Pengyi Shi

Latent Feature Mining with Large Language Models

Bingxuan Li, Pengyi Shi

Published: 07 Mar 2025, Last Modified: 25 Mar 2025GenAI4Health PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: AI for Healthcare; Machine Learning for Healthcare; large language models; feature extraction;

TL;DR: A framework to augment observed features with latent features using large language model, enhancing the predictive power of ML models in downstream task.

Abstract: Predictive modeling often encounters significant challenges in domains with limited data availability and quality. This is particularly true in areas like healthcare, where collected features may be weakly correlated with outcomes, and gathering additional features is constrained by ethical considerations or practical limitations. Traditional machine learning (ML) models struggle to incorporate unobserved yet critical factors. In this work, we introduce an effective approach to formulate latent feature mining as text-to-text propositional logical reasoning. We propose FLAME (Faithful Latent FeAture Mining for Predictive Model Enhancement), a framework that leverages large language models (LLMs) to augment observed features with latent features and enhance the predictive power of ML models in downstream tasks. Our framework is generalizable across various domains with necessary domain-specific adaptation, as it is designed to incorporate contextual information unique to each area, ensuring effective transfer to different areas facing similar data availability challenges. We validate our framework with a case study using the MIMIC data. Our results show that inferred latent features significantly enhance the downstream classifier over 10%.

Submission Number: 73

Loading