E2E Refined Dataset

Published: 2023, Last Modified: 24 May 2024O-COCOSDA 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: As a well-known meaning representation (MR)-to-text dataset, the E2E dataset has been used by many studies in natural language generation. However, the dataset suffers from many deletion, insertion, and substitution errors in its MR-text pairs that affect the quality of MR-to-text system trained using the dataset. In this paper, we develop a refined dataset by fixing text and MR errors, applying text normalization, and giving extra annotations on the MR part. We release Python codes to convert the original E2E dataset to the refined one on GitHub.
Loading