PG-Story: Taxonomy, Dataset, and Evaluation for Ensuring Child-Safe Content for Story Generation

ACL ARR 2024 June Submission1449 Authors

14 Jun 2024 (modified: 02 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Creating children's stories through text generation is a creative task that demands stories to be not only entertaining but also suitable for young audiences. However, current story generation systems rely on pre-trained language models fine-tuned with limited story data, which may not always prioritize child-friendliness. This can lead to the unintended generation of stories containing problematic elements such as violence, profanity, and biases. Regrettably, despite the significance of these concerns, there is a lack of clear guidelines and benchmark datasets for ensuring content safety for children. In our paper, we introduce a taxonomy specifically tailored to assess content safety in text, with a strong emphasis on children's well-being. We present the \textsc{PG-Story}, a dataset that includes detailed annotations for both sentence-level and discourse-level safety. We demonstrate the potential of identifying unsafe content through self-diagnosis and employing controllable generation techniques during the decoding phase to minimize unsafe elements in generated stories.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: corpus creation,benchmarking,NLP datasets,evaluation
Contribution Types: Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: English
Submission Number: 1449
Loading