Logic-Level Evaluation of Logical Table-to-Text Generation

Published: 18 May 2026, Last Modified: 18 May 2026CoNLL 2026 ArchivalEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Logical table to text generation, operation-aware evaluation, evaluation metrics, diagnostic framing
TL;DR: We propose an operation-aware diagnostic framework for evaluating LLMs on logical table-to-text generation.
Abstract: Logical Table-to-Text (LT2T) generation aims to produce natural-language sentences that are logically faithful to structured tabular data. While recent Large Language Models (LLMs) show high performance on aggregate fidelity metrics, these scores provide only a coarse view of performance, obscuring specific logic-type reasoning failures and models' meta-logical awareness. We propose an operation-aware diagnostic framework that evaluates four core competencies: (1) Logical Form (LF) execution accuracy, (2) fidelity of LF-conditioned generation, (3) logic-type identification, and (4) LF-free generation. We apply this framework to a suite of frontier LLMs and perform fine-grained analysis across logic types such as aggregation, ordinal, and superlative reasoning. Our results show that LT2T fidelity assessment can be unstable; the choice of verifier and logic type can substantially alter conclusions and model rankings. Crucially, we identify a meta-logical gap: models often generate faithful statements while failing to identify the underlying operation.
Supplementary Material: zip
Scope Confirmation: To the best of my judgment, this submission falls within the scope of CoNLL.
Primary Area Selection: Resources and Tools for Scientifically Motivated Research
Secondary Area Selection: Theoretical Analysis and Interpretation of ML Models for NLP
Use Of Generative Artificial Intelligence Tools: Yes, for editing/proofreading the manuscript
Data Collection From Human Subjects: Yes, with details included in the main paper or in an appendix on (1) how the data was obtained (2) how participants were recruited and paid (3) how consent was obtained (4) whether a IRB protocol was approved for this study. Note that providing this information is obligatory.
Submission Type: Archival: I certify that the submission has not been previously published, nor is the material in it under review by another journal or conference. Further, no material in it will be submitted for review at another conference or journal while under review by CoNLL 2026.
Submission Number: 192
Loading