Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled CorpusDownload PDFOpen Website

2021 (modified: 14 Dec 2021)EMNLP (1) 2021Readers: Everyone
Abstract: Jesse Dodge, Maarten Sap, Ana Marasović, William Agnew, Gabriel Ilharco, Dirk Groeneveld, Margaret Mitchell, Matt Gardner. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021.
0 Replies

Loading