Keywords: Forma Langauge; Large Language Models; Benchmark
TL;DR: We aim to propose a formal language benchmark for evaluating large language models.
Abstract: Empirical research has guided the progress of large language models (LLMs) over the years, where we often have a limited understanding of the underlying data fed to them. We take an orthogonal approach to the problem, and propose a formal language benchmark for studying LLMs.
We ask the following questions: (a) Why do we need formal language as a test bed to study LLMs?, and (b) How do we measure the language proficiency of an LLM? As contributions, we highlight the preciseness and control of probabilistic formal languages, which are well-suited for studying LLMs. Moreover, we make a contrast between a generative test and a discriminative test in determining the
language proficiency of an LLM, where the latter is comparable across LLMs.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 112
Loading