# Code for What are the best systems? New perspectives on NLP Benchmarking

Code for Neurips 2022 submission What are the best systems? New perspectives on NLP Benchmarking.



## Dataset
We provide our dataset in data/

##  Repproducing experiements

For each experiement, we provide an independant notebook.




All experiements where runed on single cpu. To analyse one dataset a run last less than an hour. 

