{"plain_text": {"description": "Large Yelp Review Dataset.\nThis is a dataset for binary sentiment classification. We provide a set of 560,000 highly polar yelp reviews for training, and 38,000 for testing. \nORIGIN\nThe Yelp reviews dataset consists of reviews from Yelp. It is extracted\nfrom the Yelp Dataset Challenge 2015 data. For more information, please\nrefer to http://www.yelp.com/dataset_challenge\n\nThe Yelp reviews polarity dataset is constructed by\nXiang Zhang (xiang.zhang@nyu.edu) from the above dataset.\nIt is first used as a text classification benchmark in the following paper:\nXiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks\nfor Text Classification. Advances in Neural Information Processing Systems 28\n(NIPS 2015).\n\n\nDESCRIPTION\n\nThe Yelp reviews polarity dataset is constructed by considering stars 1 and 2\nnegative, and 3 and 4 positive. For each polarity 280,000 training samples and\n19,000 testing samples are take randomly. In total there are 560,000 trainig\nsamples and 38,000 testing samples. Negative polarity is class 1,\nand positive class 2.\n\nThe files train.csv and test.csv contain all the training samples as\ncomma-sparated values. There are 2 columns in them, corresponding to class\nindex (1 and 2) and review text. The review texts are escaped using double\nquotes (\"), and any internal double quote is escaped by 2 double quotes (\"\").\nNew lines are escaped by a backslash followed with an \"n\" character,\nthat is \"\n\".\n", "citation": "@article{zhangCharacterlevelConvolutionalNetworks2015,\n  archivePrefix = {arXiv},\n  eprinttype = {arxiv},\n  eprint = {1509.01626},\n  primaryClass = {cs},\n  title = {Character-Level {{Convolutional Networks}} for {{Text Classification}}},\n  abstract = {This article offers an empirical exploration on the use of character-level convolutional networks (ConvNets) for text classification. We constructed several large-scale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results. Comparisons are offered against traditional models such as bag of words, n-grams and their TFIDF variants, and deep learning models such as word-based ConvNets and recurrent neural networks.},\n  journal = {arXiv:1509.01626 [cs]},\n  author = {Zhang, Xiang and Zhao, Junbo and LeCun, Yann},\n  month = sep,\n  year = {2015},\n}\n\n", "homepage": "https://course.fast.ai/datasets", "license": "", "features": {"text": {"dtype": "string", "id": null, "_type": "Value"}, "label": {"num_classes": 2, "names": ["1", "2"], "id": null, "_type": "ClassLabel"}}, "post_processed": null, "supervised_keys": null, "task_templates": [{"task": "text-classification", "text_column": "text", "label_column": "label"}], "builder_name": "yelp_polarity", "config_name": "plain_text", "version": {"version_str": "1.0.0", "description": "", "major": 1, "minor": 0, "patch": 0}, "splits": {"train": {"name": "train", "num_bytes": 413558837, "num_examples": 560000, "dataset_name": "yelp_polarity"}, "test": {"name": "test", "num_bytes": 27962097, "num_examples": 38000, "dataset_name": "yelp_polarity"}}, "download_checksums": {"https://s3.amazonaws.com/fast-ai-nlp/yelp_review_polarity_csv.tgz": {"num_bytes": 166373201, "checksum": "528f22e286cad085948acbc3bea7e58188416546b0e364d0ae4ca0ce666abe35"}}, "download_size": 166373201, "post_processing_size": null, "dataset_size": 441520934, "size_in_bytes": 607894135}}