Unlocking Latent Medical Reasoning in LLMs via Inference-Time Representation and Prefix Interventions

Unlocking Latent Medical Reasoning in LLMs via Inference-Time Representation and Prefix Interventions

ACL ARR 2026 January Submission2218 Authors

02 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Medical Reasoning, Representation Engineering, Prefix Tuning, Large Language Models, Data Efficiency

Abstract: Recent reasoning advances in large language models (LLMs) have broadened their applicability to medical tasks. Yet most prior work remains dependent on scarce, high-quality rationales and compute-intensive post-training, with limited exploration of how to leverage the medical capabilities acquired during pretraining. Consequently, a key challenge is how to elicit these latent capabilities in a data-efficient manner. To address this gap, our study introduces RIPT, a lightweight framework for data-efficient capability activation. RIPT explicitly decomposes the objective into two complementary components: reasoning enhancement and medical knowledge elicitation. For the former, we extract steering vectors from hidden activations on a small set of high-quality paired reasoning/direct responses to shape LLMs' reasoning behavior. For the latter, we obtain prefix vectors via prefix tuning on simple medical QA pairs to elicit domain-specific knowledge. At inference, we freeze the backbone LLM and apply a hybrid intervention that jointly injects both steering and prefix vectors. Experiments under limited-resource settings show that RIPT consistently outperforms strong baselines, suggesting an efficient pathway for unlocking LLMs’ medical reasoning capabilities.

Paper Type: Long

Research Area: Clinical and Biomedical Applications

Research Area Keywords: representation learning, parameter-efficient-training, data-efficient training, healthcare applications, clinical NLP, biomedical QA, reasoning

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency

Languages Studied: English

Submission Number: 2218

Loading