Prompting Large Language Models for fMRI-Based Brain Semantic Decoding

Anna Sato, Ichiro Kobayashi

Published: 01 Jan 2026, Last Modified: 11 Mar 2026CrossrefEveryoneRevisionsCC BY-SA 4.0

Abstract: Brain decoding, which aims to reconstruct perceptual experiences or cognitive states from neural activity, has seen significant advancements with the integration of large language models (LLMs). This study investigates the impact of different LLM architectures (GPT [11] and Llama-3), model scales (1B to 70B parameters for Llama-3), and instructional prompting on semantic decoding accuracy from fMRI data. Utilizing a public fMRI dataset where participants listened to speech narratives or viewed silent movies, we adapted a previously established decoding framework [15]. Our findings reveal that Llama-3 generally exhibited superior brain activity encoding compared to GPT models. In decoding, the impact of prompts was particularly critical for Llama-3 models. For the speech listening task, while Fine-tuned GPT, which is used in prior research, demonstrated strong baseline accuracy, Llama-3 models without prompts performed considerably lower. However, through instructional prompts that effectively guided and constrained model’s output toward more task-relevant content, Llama-3’s performance substantially improved, achieving comparable results to Fine-tuned GPT across several metrics. Furthermore, on the more challenging movie dataset, which required decoding silent visual narratives using an encoding model trained exclusively on auditory speech, prompted Llama-3 models frequently outperformed GPT models. While lower movie decoding accuracy highlighted cross-task generalization limits, the performance of prompted LLMs demonstrates significant potential, underscoring the need for more versatile encoding strategies in brain decoding.

External IDs:doi:10.1007/978-981-95-4378-6_27