Teach Me to Explain: A Review of Datasets for Explainable Natural Language Processing

Sarah Wiegreffe; Ana Marasovic

Teach Me to Explain: A Review of Datasets for Explainable Natural Language Processing

Sarah Wiegreffe, Ana Marasovic

Published: 29 Jul 2021, Last Modified: 26 May 2025NeurIPS 2021 Datasets and Benchmarks Track (Round 1)Readers: Everyone

Keywords: NLP, explainability, explainable NLP, explainable AI, XAI

TL;DR: We identify datasets with 3 classes of textual explanations, organize the literature on annotating each type, identify strengths/shortcomings of existing collection methodologies, and give recommendations for collecting explanations in the future.

Abstract: Explainable Natural Language Processing (ExNLP) has increasingly focused on collecting human-annotated textual explanations. These explanations are used downstream in three ways: as data augmentation to improve performance on a predictive task, as supervision to train models to produce explanations for their predictions, and as a ground-truth to evaluate model-generated explanations. In this review, we identify 65 datasets with three predominant classes of textual explanations (highlights, free-text, and structured), organize the literature on annotating each type, identify strengths and shortcomings of existing collection methodologies, and give recommendations for collecting ExNLP datasets in the future.

Supplementary Material: zip

URL: https://exnlpdatasets.github.io/

Contribution Process Agreement: Yes

Dataset Url: https://exnlpdatasets.github.io/

Dataset Embargo: N/A

License: N/A

Author Statement: Yes

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/teach-me-to-explain-a-review-of-datasets-for/code)

8 Replies

Loading