Quantifying Hallucination Bias in AI-Generated Deepfakes: A Multimodal Analysis Using Divergence Metrics

25 Feb 2026 (modified: 11 Mar 2026)PAKDD 2026 Workshop JENAI Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Artificial intelligence hallucinations, deepfakes, generative models, convolutional autoencoders, divergence metrics, FaceForensics++, multimodal analysis, reconstruction error, θ divergence metric, generative model evaluation
TL;DR: We introduce a novel divergence metric (θ) to quantitatively measure hallucination bias in generative models and show that overfitted models deviate more from ground truth than deepfake-trained models.
Abstract: Hallucinations and deepfakes represent two emergent challenges in generative AI. Hallucinations arise from model overfitting or misinterpretation, causing AI systems to produce content with no basis in reality, whereas deepfakes are deliberately generated synthetic media crafted to mimic real-world data. This paper hypothesizes that an overfitted generative model can exhibit output deviations analogous to or exceeding those of a well-regularized deepfake model. To investigate this hypothesis, we train two convolutional autoencoders on the FaceForensics++ dataset: one overfitted on authentic data to induce hallucinations, and one regularized on manipulated data to emulate deepfake generation. We introduce a novel divergence metric θ, defined as the ratio of reconstruction errors between the hallucination model and the deepfake model for the same input, enabling direct quantitative comparison. Experimental evaluation includes classification accuracy, latent space visualization, statistical testing, and complementary divergence metrics such as Fréchet Inception Distance and Structural Similarity Index. Results show that while both models achieve high classification accuracy, the hallucination-prone model produces outputs with significantly greater divergence from ground truth. Statistical validation using the Mann-Whitney U test confirms the significance of this difference (p < 0.00001). This work provides a quantitative framework for measuring hallucination bias and improving generative model evaluation and deepfake detection.
Submission Number: 6
Loading