Hierarchical Frequency Tagging Probe (HFTP): A Unified Approach to Investigate Syntactic Structure Representations in Large Language Models and the Human Brain
Large Language Models (LLMs) have shown impressive capabilities across a range of language tasks. However, questions remain about whether LLMs effectively encode linguistic structures such as phrases and sentences and how closely these representations align with those in the human brain. Here, we introduce the Hierarchical Frequency Tagging Probe (HFTP) to probe the phrase and sentence representations in LLMs and the human brain in a unified manner. HFTP utilizes frequency-domain analysis to identify which LLM computational modules (multilayer perceptron (MLP) neurons) or human cortical areas encode phrases or sentences. Human brain activity is recorded using intracranial electrodes. The results revealed distinct sensitivities to sentences and phrases across various layers of LLMs (including GPT-2, Gemma, Llama 2, Llama 3.1, and GLM-4) and across different regions of the human brain. Notably, while LLMs tend to process sentences and phrases within similar layers, the human brain engages distinct regions to process these two syntactic levels. Additionally, representational similarity analysis (RSA) shows that the syntactic representations of all five LLMs are more aligned with neural representations in the left hemisphere—the dominant hemisphere for language processing. Among the LLMs, GPT-2 and Llama 2 show the greatest similarity to human brain syntactic representations, while Llama 3.1 demonstrates a weaker resemblance. Overall, our findings provide deeper insights into syntactic processing in LLMs and highlight the effectiveness of HFTP as a versatile tool for detecting syntactic structures across diverse LLM architectures and parameters, as well as in parallel analyses of human brains and LLMs, thereby bridging computational linguistics and cognitive neuroscience.