Keywords: brain encoding, fMRI, light weight language models, larger language models, quantization, linguistic properties, flash-holmes benchmark
Abstract: Recent studies have shown that full-precision Transformer-based large language models (LLMs) increasingly improve predictions of human brain activity as model parameters are scaled. However, the corresponding growth in size and computational cost limits their interpretability and practical deployment, particularly in applications such as brain-computer interfaces (BCIs), which demand low-latency and efficient models. To address this gap, two efficiency-oriented approaches that have emerged can be used: (i) adopting small language models (SLMs), which achieve competitive performance at substantially lower computational cost, and (ii) compressing LLMs through quantization, which reduce computational demands while retaining much of their original capacity.
However, it remains unclear whether such SLMs or compressed LLMs can effectively capture brain-relevant representations and achieve brain alignment comparable to that of full-precision LLMs. Specifically, our study is motivated by four key questions: (i) can compressed LLMs preserve brain alignment, which is critical for deployment; (ii) how do compressed LLMs compare to SLMs, informing trade-offs for practical applications; and (iii) if ultra-low-resource applications demand even smaller footprints, can compressed SLMs still maintain alignment with brain activity? (iv) Which aspect of linguistic competence (discourse, morphology, syntax, semantics, or reasoning) most strongly influences brain alignment as model size or quantization method varies? In this work, we systematically evaluate LLMs (7B), their quantized counterparts, SLMs (1B and 3B), and their quantized variants to assess how model scale and compression jointly affect brain alignment, using fMRI recordings collected during naturalistic story listening. Our findings indicate that 3B SLMs achieve brain prediction performance comparable to both full-precision and compressed LLMs across whole-brain and core language regions. In contrast, 1B SLMs show a significant drop in brain alignment, particularly in semantic-processing regions. Notably, while most quantization methods preserve alignment, GPTQ quantization leads to reduced brain alignment across both LLMs and SLMs. Finally, benchmarking with the FlashHolmes suite shows that quantization primarily degrades discourse, syntax, and morphology, while leaving overall brain alignment intact.
Primary Area: applications to neuroscience & cognitive science
Submission Number: 2598
Loading