Abstract: One of the most pressing public policy issues that has involved transdisciplinary research in the field of data science is rapidly detecting widespread misinformation. While data science can pose a lot of potential for solving the big-data problem of misinformation on an automated scale, it likewise requires insights from the field of communications and journalism to define quantifiable features that can assist in more accurate misinformation predictions. Currently, the preeminent tools used for misinformation detection are large language models (LLMs) as they are renowned for their ability to capture the context and meaning of textual data. However, despite advancements in developing effective data science models and tools for identifying misinformation, there are not many available options for evaluating news article content for misinformation potential. This study proposes TRUExT, an explainable, regression-based data tool that integrates multiple communication-based natural language processing (NLP) dimensions with a base LLM to holistically evaluate trustworthiness in news articles. It was found that the Hugging Face LLM RoBERTa with the added NLP dimensions as features was the most effective foundational model after testing multiple LLMs. Furthermore, TRUExT introduced a potential big-data solution to the growing problem of misinformation through research intersecting data science and communications to capture not only the technicality of misinformation data predictions but also certain communication factors in the data. In the future, this tool could likewise be deployed to be used by U.S.-based stakeholders who have an important role in the ongoing information war.
External IDs:dblp:conf/bigdataconf/DonnerDJE24
Loading