OpenReview.net
  • Login
Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Recommendations. Open Directory. Open API. Open Source.

The BigScience Corpus A 1.6TB Composite Multilingual DatasetDownload PDF

Hugo Laurençon, Lucile Saulnier, Thomas Wang, Christopher Akiki, Albert Villanova del Moral, Teven Le Scao, Leandro Von Werra, Chenghao Mou, Eduardo González Ponferrada, Huu Nguyen, Jörg Frohberg, Mario Šaško, Quentin Lhoest, Angelina McMillan-Major, Gérard Dupont, Stella Biderman, Anna Rogers, Loubna Ben allal, Francesco De Toni, Giada Pistilli, Olivier Nguyen, Somaieh Nikpoor, Maraim Masoud, Pierre Colombo, Javier de la Rosa, Paulo Villegas, Tristan Thrush, Shayne Longpre, Sebastian Nagel, Leon Weber, Manuel Romero Muñoz, Jian Zhu, Daniel Van Strien, Zaid Alyafeai, Khalid Almubarak, Vu Minh Chien, Itziar Gonzalez-Dios, Aitor Soroa, Kyle Lo, Manan Dey, Pedro Ortiz Suarez, Aaron Gokaslan, Shamik Bose, David Ifeoluwa Adelani, Long Phan, Ian Yu, Suhas Pai, Violette Lepercq, Suzana Ilic, Margaret Mitchell, Sasha Luccioni, Yacine Jernite

06 Jun 2022, 09:42 (edited 16 Jun 2022)NeurIPS 2022 Track Datasets and Benchmarks SubmissionReaders: Everyone
Keywords:
TL;DR:
Abstract:
Supplementary Material: pdf
URL:
Dataset Url:
License:
Author Statement:
6 Replies

Loading
  • About OpenReview
  • Hosting a Venue
  • All Venues
  • Contact
  • Feedback
  • Sponsors
  • Join the Team
  • Frequently Asked Questions
  • Terms of Service
  • Privacy Policy
  • About OpenReview
  • Hosting a Venue
  • All Venues
  • Sponsors
  • Join the Team
  • Frequently Asked Questions
  • Contact
  • Feedback
  • Terms of Service
  • Privacy Policy

OpenReview is a long-term project to advance science through improved peer review, with legal nonprofit status through Code for Science & Society. We gratefully acknowledge the support of the OpenReview Sponsors.

Send Feedback

Enter your feedback below and we'll get back to you as soon as possible. To submit a bug report or feature request, you can use the official OpenReview GitHub repository:
Report an issue

BibTeX Record

Click anywhere on the box above to highlight complete record