Keywords: LLM, prompt engineering, context engineering, NLP
Abstract: Large language models (LLMs) have demonstrated remarkable potential across a broad range of applications. However, producing reliable text that faithfully represents data remains a challenge. While prior work has shown that task-specific conditioning through in-context learning and knowledge augmentation can improve performance, LLMs continue to struggle with interpreting and reasoning about numerical data. To address this, we introduce wordalisations, a methodology for generating stylistically natural narratives from data. Much like how visualisations display numerical data in a way that is easy to digest, wordalisations abstract data insights into descriptive texts. To illustrate its versatility, we apply our method to three application areas: scouting football players, personality tests, and international survey data. Due to the absence of standardized benchmarks for this specific task, we conduct LLM-as-a-judge and human-as-a-judge evaluations to assess accuracy across the three applications. We found that the wordalisation methods reduces misrepresentation of the data and shows the potential to improve communication about data. We further describe best practice methods for open and transparent development of communication about data.
Paper Type: Long
Research Area: Natural Language Generation
Research Area Keywords: LLM/AI agents, prompting, safety and alignment, human evaluation, automatic evaluation, few-shot generation, analysis, domain adaptation, data-to-text generation
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models
Languages Studied: gemini-2.5-flash,gpt-4o-mini
Submission Number: 9750
Loading