Source Attribution of Online News Images by Compression Analysis

Michael Albright, Nitesh Menon, Kristy Roschke, Arslan Basharat

Published: 2021, Last Modified: 13 Aug 2024WIFS 2021EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The rapid increase in the amount of online disinformation warrants new and robust digital forensics methods for validating purported sources of multimodal news articles. We conducted a survey of news photojournalists for insights into their workflows. A high percentage (91%) of respondents reported standardized photo publishing procedures, which we hypothesize facilitates source verification. In this work, we demonstrate that the online news sites leave predictable and discernible patterns in the compression settings of the images they publish. We propose novel, simple, and very efficient algorithms to analyze the image compression profiles for news source verification and identification. We evaluate the algorithms' effectiveness through extensive experiments on a newly-released dataset of over 64K images from over 34K articles collected from 30 news sites. The image compression features are modeled by Naive Bayes variants or XGBoost classifiers for source attribution and verification. For these news sources we are able to achieve very strong performance with the proposed algorithms resulting in 0.92–0.94 average AUC for source verification under a closed set scenario, and compelling open set generalization with only 0.0–0.04 reduction in the average AUC.