Safety Bench: Identifying Safety-Sensitive Situations for Open-domain Conversational Systems

Anonymous

Safety Bench: Identifying Safety-Sensitive Situations for Open-domain Conversational Systems

Anonymous

16 Oct 2021 (modified: 05 May 2023)ACL ARR 2021 October Blind SubmissionReaders: Everyone

Abstract: The social impact of natural language processing and its applications has received increasing attention. Here, we focus on the problem of safety for end-to-end conversational AI. We survey the problem landscape therein, introducing a taxonomy of three observed phenomena: the Instigator, Yea-Sayer, and Impostor effects. To help researchers better understand the impact of their conversational models with respect to these scenarios, we present Safety Bench, a set of open-source tooling for quickly assessing safety issues. Finally, we provide extensive analysis of these tools using five popular models and make recommendations for future use.

0 Replies

Loading