Keywords: Story Generation, Story Evaluation, Dataset, Storytelling, NLP, Evaluation, Contrastive learning, Language Models, Fine Tuning, Efficiency, Interactive Machine Learning, Narrative, Creativity, Human Centered AI, Creativity, Generative Models, World Models, Reader Models
Abstract: Recent advances in large-scale language models (Raffel et al., 2019; Brown et al., 2020) have brought significant qualitative and quantitative improvements in machine-driven text generation. Despite this, generation and evaluation of machine-generated narrative text remains a challenging problem. Objective evaluation of computationally-generated stories may be prohibitively expensive, require meticulously annotated datasets, or may not adequately measure the logical coherence of a generated story’s narratological structure. Informed by recent advances in contrastive learning (Radford et al., 2021),we present Contrastive Authoring and Reviewing Pairing (CARP): a scalable, efficient method for performing qualitatively superior, zero-shot evaluation of stories. We show a strong correlation between human evaluation of stories and those of CARP. Model outputs more significantly correlate with corresponding human input than those language-model based methods which utilize finetuning or prompt engineering approaches. We also present and analyze the Story-Critique Dataset, a new corpora composed of 1.3 million aligned story-critique pairs derived from over 80,000 stories. We expect this corpus to be of interest to NLP researchers.
One-sentence Summary: Zero-shot classifiers and high level analysis of short stories with one end to end model
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/arxiv:2110.03111/code)
6 Replies
Loading