Keywords: Structural Probing, Large Language Models, Backus-Naur Form, Linguitic Framework, AI Interpretability
TL;DR: The paper presents a probing tool that compares LLM embeddings to formal linguistic frameworks like Lexical-Functional Grammar, Categorial Grammar, and Head-Driven Phrase Structure Grammar.
Abstract: The paper introduces a novel probing tool that uses linguistic frameworks, such as Lexical-Functional Grammar (LFG), Categorial Grammar (CG), and Head-Driven Phrase Structure Grammar (HPSG), to explore the correspondence between large language model (LLM) embeddings and formal language descriptions. The method uses LLM embeddings to construct a graph and compare it to representations generated from a linguistic framework description based on the Backus-Naur Form (BNF). By identifying intersections between the graphs, the method allows for assessing the similarity between LLM internal structure and formal linguistic theories. The findings suggest that while LLM representations do not fully correspond to any linguistic framework, they offer insights into the language structures that exceed traditional theories because LLMs' knowledge is derived from the real-world language data they are trained on. This idea questions existing linguistic theories providing new methods for verifying and refining linguistic hypotheses.
Track: Main track
Submitted Paper: No
Published Paper: No
Submission Number: 76
Loading