Embracing Data Abundance

Ondrej Bajgar; Rudolf Kadlec and Jan Kleindienst

Embracing Data Abundance

Ondrej Bajgar, Rudolf Kadlec and Jan Kleindienst,

23 Jun 2026 (modified: 17 Feb 2017)ICLR 2017Readers: Everyone

Abstract: There is a practically unlimited amount of natural language data available. Still, recent work in text comprehension has focused on datasets which are small relative to current computing possibilities. This article is making a case for the community to move to larger data and is offering the BookTest dataset as a step in that direction.

Conflicts: ibm.com

Keywords: Transfer Learning, Semi-Supervised Learning, Natural language processing, Deep learning

4 Replies

Loading