Abstract: Highlights•Factual accuracy problems limit the usefulness of neural solutions for complex data-to-text.•Existing evaluation methods miss many of these errors, such as hallucination.•We propose and evaluate a gold standard protocol for detecting factual errors in generated text.•We show how this gold standard can be used to measure the efficacy of other methods.•We also explore the common types of error in both human-authored and neural data-to-text systems.
Loading