# <img src="https://github.com/jungokasai/THumB/blob/master/figs/thumb.png" height="40" alt="thumb-up"> Transparent Human Benchmark (THumB)<br/> for Natural Language Generation


<p align="center">
</p>
<p align="center">
<a href="https://allenai.org/">
<img src="https://github.com/jungokasai/THumB/blob/master/figs/ai2_logo.png" height="100" alt="AI2 Logo" style="padding-right:160">
</a>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<a href="https://www.cs.washington.edu/research/nlp">
<img src="https://github.com/jungokasai/THumB/blob/master/figs/uwnlp_logo.png" height="70" alt="UWNLP Logo">
</a>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<a href="https://yale-lily.github.io/">
<img src="https://raw.githubusercontent.com/Yale-LILY/SummEval/master/assets/logo-lily.png" height="50" alt="LILY Logo" style="padding-right:160">
</a>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<a href="https://www.salesforce.com/">
<img src="https://raw.githubusercontent.com/Yale-LILY/SummEval/master/assets/logo-salesforce.svg" height="50" alt="Salesforce Logo">
</a>
</p>

## Introduction
Human evaluations for language generation tasks facilitate development of both generation models and automatic metrics. We provide <img src="https://github.com/jungokasai/THumB/blob/master/figs/thumb.png" height="15" alt="thumb-up"> THumB (**T**ransparent **Hum**an **B**enchmark) scores for generation tasks. More tasks might be added in the future.
- [MSCOCO Image Captioing](https://github.com/jungokasai/THumB/tree/master/mscoco)
- [CNNDM Summarization](https://github.com/jungokasai/THumB/tree/master/cnndm)

## Citations
### MSCOCO Captioning Evaluations and THumB 1.0 Protocol
```
@inproceedings{kasai2022thumb,
    title   = {Transparent Human Evaluation for Image Captioning},
    author  = {Jungo Kasai and Keisuke Sakaguchi and Lavinia Dunagan and Jacob Morrison and Ronan Le Bras and Yejin Choi and Noah A. Smith},
    year    = {2022},
    booktitle = {Proc.\ of NAACL},
    url     = {https://arxiv.org/abs/2111.08940}, 
}
```
### CNNDM Summarization Evaluations
```
@article{fabbri2021summeval,
    title   = {{SummEval}: Re-evaluating Summarization Evaluation},
    author  = {Fabbri, Alexander R and Kry{\'s}ci{\'n}ski, Wojciech and McCann, Bryan and Xiong, Caiming and Socher, Richard and Radev, Dragomir},
    journal = {TACL},
    year    = {2021},
    url     = {https://arxiv.org/abs/2007.12626},
}
```
