ALF: A Fine-Grained French Analogical Dataset for Evaluating Lexical Knowledge of Large Language Models

Alexander Petrov, Antoine Venant, François Lareau, Yves Lepage, Philippe Langlais

Published: 2025, Last Modified: 18 Mar 2026ECAI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The undeniable revolution brought forth by Large Language Models (LLMs) stems from the amazing fluency of the texts they generate, mastering language with seemingly human-like finesse. This fluency raises a key scientific question: How much lexical knowledge do LLMs actually capture in order to produce such fluent language? To address this, we present ALF, a freely available, analogical dataset endowed with rich lexicographic information grounded in Meaning-Text Theory for the French language. It comprises 2600 fine-grained lexical analogies with which we evaluate the lexical ability of five off-the-shelf LLMs, namely ChatGPT-4o mini, Llama3.0-8B, Llama3.1-8B, Qwen2.5-14B, and Mistral7B. Their performance spans from 45% for Mistral, through about 55% for the ChatGPT and Llama models, and up to nearly 60% for Qwen2.5-14B, thus qualifying ALF as a challenging dataset. Experimenting with larger models (OpenAI o1, Llama3.0/3.1-70B, and Qwen2.5-32B) yields rather limited returns considering the drastic increase in computational cost. We further identify certain types of analogies and prompting methods that reveal performance disparities.
Loading