Keywords: large language models, instruction-tuning, world knowledge, neuroscience, neuroAI
TL;DR: This paper explores how instruction-tuning affects language models from a neuroscientific perspective, revealing that it generally improves their alignment with human brain activity, with model size and world knowledge playing key roles.
Abstract: Instruction-tuning is a widely adopted method of finetuning that enables large language models (LLMs) to generate output that more closely resembles human responses to natural language queries, in many cases leading to human-level performance on diverse testbeds. However, it remains unclear whether instruction-tuning truly makes LLMs more similar to how humans process language. We investigate the effect of instruction-tuning on LLM-human similarity in two ways: (1) brain alignment, the similarity of LLM internal representations to neural activity in the human language system, and (2) behavioral alignment, the similarity of LLM and human behavior on a reading task. We assess 25 vanilla and instruction-tuned LLMs across three datasets involving humans reading naturalistic stories and sentences, and discover that instruction-tuning generally enhances brain alignment by an average of 6%, but does not have a similar effect on behavioral alignment. To identify the factors underlying LLM-brain alignment, we compute the correlation between the brain alignment of LLMs and various model properties, such as model size, performance ability on problem-solving benchmarks, and ability on benchmarks requiring world knowledge spanning various domains. Notably, we find a strong positive correlation between brain alignment and model size (r = 0.95), as well as performance on tasks requiring world knowledge (r = 0.81). Our results demonstrate that instruction-tuning LLMs improves both world knowledge representations and human brain alignment, suggesting that mechanisms that encode world knowledge in LLMs also improve representational alignment to the human brain.
Submission Number: 94
Loading