VJPrompt: VAE-like Jailbreaking Prompt Strategy to Unmask Deceptive Power of Large Language ModelsDownload PDF

Anonymous

16 Oct 2023ACL ARR 2023 October Blind SubmissionReaders: Everyone
Abstract: Automatic misinformation detection plays a crucial role in preventing the spread of false information, particularly in the medical field where individuals without domain expertise may pursue incorrect treatment approaches. While automatic fake news detection methods have been proven effective in identifying human-generated news articles, the emergence of Large Language Models (LLMs) has introduced new challenges. These LLMs can mimic the writing styles of authentic news and introduce creative twists on facts, challenging traditional fake news detection techniques. To assess the efficacy of detecting such content, we first demonstrate that fake news can be generated by LLMs by introducing a prompt strategy called variational autoencoder (VAE)-like jailbreak prompt (VJPrompt) that bypasses ethical checks and generates high-quality fake news. Then, we mix the VJPrompt-generated fake news with real news and human-generated fake news to examine the efficiency of different fake news detection methods. The results show that there remain challenges in detecting VJPrompt-generated fake news.
Paper Type: short
Research Area: Resources and Evaluation
Contribution Types: Data resources, Data analysis
Languages Studied: English
0 Replies

Loading