Fake News Detection with Retrieval Augmented Generative Artificial Intelligence

Fake News Detection with Retrieval Augmented Generative Artificial Intelligence

KDD 2024 Workshop KiL Submission17 Authors

01 Jun 2024 (modified: 29 Jun 2024)Submitted to KiL 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Fake News Detection, Sparse Mixture of Experts, Retrieval-Augmented Generation, Large Language Model, Google Search API

TL;DR: This study introduces a new method combining Mixtral 8x7B LLM with a Retrieval-Augmented Generation framework, using real-time data from Google's search API, offering efficient, cost-effective fake news detection.

Abstract: The rapid spread of false information on social media has grown to be a serious problem that influences public opinion and decision-making. Fake news spreads rapidly and extensively, often outpacing efforts to debunk or mitigate its effects. Traditional methods for detecting fake news face numerous challenges, including the necessity for extensive model training and the potential for inherent biases. Although Large Language Models (LLMs) have seen substantial improvements recently, their use in fake news detection poses the risk of producing false or misleading information due to their possible hallucinations. This study presents a new strategy to combat fake news by integrating Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) Large Language Model, with a Retrieval-Augmented Generation (RAG) framework. Our framework employs Google's search API to retrieve relevant articles in real time, harnessing Mixtral's sophisticated language processing capabilities and RAG's ability to access current information dynamically. Initial results are promising, indicating that our approach performs comparably to established fake news detection techniques. Our method operates without the need for extensive model training, offering significant cost savings and contributing to developing more efficient tools for detecting misinformation in the digital era, which will help stop the spread of misleading data more efficiently.

Submission Number: 17

Loading