Q&A-LF : A French Question-Answering Benchmark for Measuring Fine-Grained Lexical Knowledge

Alexander Petrov, Alessandra Thais Mancas, Viviane Binet, Antoine Venant, François Lareau, Yves Lepage, Phillippe Langlais

Published: 2025, Last Modified: 18 Mar 2026RANLP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We introduce Q&A-LF, a French, question-answering benchmark designed to assess the extent to which large language models capture fine-grained lexical knowledge. We investigate the ability of ChatGPT-4o mini, Qwen2.5-14B, Llama3.0-8B, and Llama3.1-8B to answer questions based on lexical functions from Meaning-Text Theory. Using various prompting setups with different levels of examples and context, we find that Qwen and ChatGPT generally outperform Llama models, achieving up to 70% accuracy, while Llama models reach just above 60%. We identify LFs that are particularly easy or especially challenging for the models. We further investigate whether providing sentence-level context and one-shot prompting improve performance, especially on semantically complex functions.
Loading