Keywords: Large Language Models, Generative AI in Journalism, Data-Driven News Reporting, Analytical Calibration, Responsible AI
Abstract: Generative AI and large language models (LLMs) are increasingly integrated into data-driven journalism workflows, enabling automated report generation from structured data sources. While prior work has emphasized fluency, coherence, and reasoning performance, significantly less attention has been paid to the analytical calibration of the claims produced by LLM-based systems. In data-centric news contexts, miscalibrated interpretations—such as overstated statistical significance or unsupported causal claims—can distort public understanding and reduce trust in AI-assisted journalism. This extended abstract proposes a research framework for evaluating and improving the analytical calibration of LLM-generated news reports. Rather than presenting empirical findings, we outline a structured evaluation methodology that measures alignment between numerical evidence (e.g., statistical test outputs, effect sizes, confidence intervals) and the corresponding natural-language claims. We introduce proposed metrics to quantify overstatement, under-confidence, and statistical inconsistency, and describe an experimental design for controlled evaluation under varying data conditions. The goal of this work is to establish a principled foundation for assessing and improving calibration in LLM-assisted news generation systems, contributing to the responsible deployment of generative AI in journalism.
Submission Number: 2
Loading